inmoose.deseq2.DESeqDataSet.DESeqDataSet.estimateDispersions

DESeqDataSet.estimateDispersions(fitType='parametric', maxit=100, useCR=True, weightThreshold=0.01, quiet=False, modelMatrix=None, minmu=None)

Estimate the dispersions for a DESeqDataSet

This function obtains dispersion estimates for Negative Binomial distributed data. The function is typically called with the idiom: dds = dds.estimateDispersions().

The fitting proceeds as follows: for each gene, an estimate of the dispersion is found which maximizes the Cox-Reid adjusted profile likelihood (the methods of Cox-Reid adjusted profile likelihood maximization for estimation of dispersion in RNA-Seq data were developed by [McCarthy2012], first implemented in the edgeR package in 2010 (see inmoose.edgepy.glmFit() and inmoose.edgepy.dispCoxReid()); a trend line capturing the dispersion-mean relationship is fit to the maximum likelihood estimates; a normal prior is determined for the log dispersion estimates centered on the predicted value from the trended fit with variance equal to the difference between the observed variance of the log dispersion estimates and the expected sampling variance; finally maximum a posteriori dispersion estimates are returned. This final dispersion parameter is used in subsequent tests. The final dispersion estimates can be accessed through DESeqDataSet.dispersions. The fitted dispersion-mean relationship is also used in varianceStabilizingTransformation(). All of the intermediate values (gene-wise dispersion estimates, fitted dispersion estimates from the trended fit, etc.) are stored in dds.var.

The log normal prior on the dispersion parameter has been proposed by [Wu2012].

See also

estimateDispersionsGeneEst

lower-level function called by this function

estimateDispersionsFit

lower-level function called by this function

estimateDispersionsMAP

lower-level function called by this function

Parameters:
  • obj (DESeqDataSet) – the input dataset

  • fitType ("parametric", "local", "mean" or "glmGamPoi") –

    the type of fitting the dispersions to the mean intensity.

    "parametric" - fit a dispersion-mean relation of the form: \(dispersion = asymtDisp + extraPois / mean\) via a robust gamma-family GLM. The coefficients asymtDisp and extraPois are given in the attribute coefficients of the DESeqDataSet.dispersionFunction.

    "local" - use the locfit package to fit a local regression of log dispersions over log base mean (normal scale means and dispersions are input and output for DESeqDataSet.dispersionFunction. The points are weighted by normalized mean count in the local regression.

    "mean" - use the mean of gene-wise dispersion estimates.

    "glmGamPoi" - use the glmGamPoi package to fit the gene-wise dispersion, its trend, and calculate the MAP based on the quasi-likelihood framework. The trend is calculated using a local median regression.

  • maxit (int) – maximum number of iterations to allow for convergence

  • useCR (bool) – whether to use Cox-Reid adjustment (see [McCarthy2012])

  • weightThreshold (float) – threshold for subsetting the design matrix and GLM weights for calculating the Cox-Reid correction

  • quiet (bool) – whether to print messages at each step

  • modelMatrix (array-like, optional) – an optional matrix which will be used for fitting the expected counts. By default, the model matrix is constructed from DESeqDataSet.design.

  • minmu (float) – lower bound on the estimated count for fitting gene-wise dispersion

Returns:

the input obj, with the dispersion information filled in fields in DESeqDataSet.var, or the final dispersions accessible via DESeqDataSet.dispersions.

Return type:

DESeqDataSet