Ok, just a quick post here to verbalize something that bothers me (and everybody else, I suspect).. The issue of poor documentation of software.. Even ‘well’ documented programs have huge issues.. For instance, I am working with edgeR a lot lately.. The documentation is pretty good (link), but there is a lot still left up to the investigators imagination.. For instance, when using the fiunction ‘calcNormFactors’, this is what the manual says:
Description: Calculate normalization factors to scale the raw library sizes.
calcNormFactors(object, method=c(“TMM”,”RLE”,”upperquartile”), refColumn = NULL,logratioTrim = .3, sumTrim = 0.05, doWeighting=TRUE, Acutoﬀ=-1e10, p=0.75)
object: either a matrix of raw (read) counts or a DGEList object
method: method to use to calculate the scale factors
refColumn: column to use as reference for method=”TMM”
logratioTrim: amount of trim to use on log-ratios (“M” values) for method=”TMM”
sumTrim: amount of trim to use on the combined absolute levels (“A” values) for method=”TMM”
doWeighting: logical, whether to compute (asymptotic binomial precision) weights for method=”TMM”
Acutoﬀ: cutoff on “A” values to use before trimming for method=”TMM”
p: percentile (between 0 and 1) of the counts that is aligned when method=”upperquartile”
method=”TMM” is the weighted trimmed mean of M-values (to the reference) proposed by
Robinson and Oshlack (2010), where the weights are from the delta method on Binomial data.
If refColumn is unspeciﬁed, the library whose upper quartile is closest to the mean upper quartile
Where do the default values for logratioTrim, sumTrim, etc. come from.. Should I change them? To what, how, why, when.. It would be really nice to have some guidance about when to consider changing these, and how to reasonably do so..
Of note, nearly every one of the functions contain poorly documented options, and edgeR is one of the most well documented packages out there..