inmoose.deseq2.replaceOutliers
- inmoose.deseq2.replaceOutliers(obj, trim=0.2, cooksCutoff=None, minReplicates=7, whichSamples=None)
Replace outliers with trimmed mean
Note that this function is called within
DESeq(), so is not necessary to call on top of aDESeq()call. See the documentation forminReplicatesForReplaceinDESeq().This function replaces outlier counts flagged by extreme Cook’s distances, as calculated by
DESeq(),nbinomWaldTest()ornbinomLRT(), with values predicted by the trimmed mean over all samples (and adjusted by size factor or normalization factor). This function replaces the counts in the matrix returned bydds.counts()and the Cook’s distances indds.layers["cook"]. Original counts are preserved indds.layers["originalCounts"].The
DESeq()function calculates a diagnostic measure called Cook’s distance for every gene and every sample. TheDESeqDataSet.results()function then sets the p-values to NA for genes which contain an outlying count as defined by a Cook’s distance above a threshold. With may degrees of freedom, i.e. many more samples than number of parameters to be estimated, it might be undesirable to remove entire genes from the analysis just because their data include a single count outlier. An alternative strategy is to replace the outlier counts with the trimmed mean over all samples, adjusted by the size factor or normalization factor for that sample. The following simple function performs this replacement for the user, for samples which have at leastminReplicatesnumber of replicates (including that sample). For more information on Cook’s distance, please see the two sections of the module documentation: “Dealing with count outliers” and “Count outlier detection”.- Parameters:
obj (DESeqDataSet) – a DESeqDataSet that has already been processed by either
DESeq(),nbinomWaldTest()ornbinomLRT(), and therefore contains a matrix of Cook’s distances (used to define the outlier counts) inobj.layers["cooks"].trim (float) – the fraction (0 to 0.5) of observations to be trimmed from each end of the normalized counts for a gene before the mean is computed.
cooksCutoff (float) – the threshold for defining an outlier to be replaced. Defaults to the .99 quantile of the \(F(p, m-p)\) distribution, where \(p\) is the number of parameters and \(m\) is the number of samples.
minReplicates (int) – the minimum number of replicate samples necessary to consider a sample eligible for replacement (including itself). Outlier counts will not be replaced if the sample is in a cell which has less than
minReplicatesreplicates.whichSamples (array-like, optional) – a numeric or logical index to specify which samples should have outliers replaced. If missing, this is determined using
minReplicates.
- Returns:
the input
objwith replaced counts in the slot returned byDESeqDataSet.counts(), and the original counts preserved inobj.layers["originalCounts"].- Return type: