inmoose.edgepy.glmQLFit

inmoose.edgepy.glmQLFit(y, design=None, dispersion=None, offset=None, lib_size=None, weights=None, abundance_trend=True, AveLogCPM=None, robust=False, winsor_tail_p=(0.05, 0.1))

Fit a quasi-likelihood negative binomial generalized log-linear model to count data.

Implement one of the quasi-likelihood (QL) methods of [Lund2012], with some enhancements and with slightly different glm, trend and FDR methods. See [Lun2016] or [Chen2016] for tutorials describing the use of glmQLFit() and glmQLFTest() as part of a complete pipeline. Another case study using glmQLFit() and glmQLFTest() is given in Section 4.7 of the edgeR User’s Guide.

glmQLFit() is similar to glmFit() except that it also estimates QL dispersion values. It calls the limma function squeezeVar() to conduct empirical Bayes moderation of the genewise QL dispersions. If robust=True, then the robust hyperparameter estimation features of squeezeVar() are used [Phipson2016]. If abundance_trend=True, then a prior trend is estimated based on the average logCPMs.

glmQLFit() gives special attention to handling of zero counts, and in particular to situations when fitted values of zero provide no useful residual degrees of freedom for estimating the QL dispersion [Lun2017]. The usual residual degrees of freedom are returned as df_residual while the adjusted residual degrees of freedom are returned as df_residual_zeros.

Note

The negative binomial dispersions dispersion supplied to glmQLFit() and glmQLFTest() must be based on a global model, that is, they must be either trended or common dispersions. It is not correct to supply genewise dispersions because glmQLFTest() estimates genewise variability using the QL dispersion.

Parameters:
  • y (matrix) – a matrix of counts

  • design (matrix, optional) – design matrix for the genewise linear models

  • dispersion (float or array_like) – scalar, vector or matrix of negative binomial dispersions. If None, it will be extracted from the DGEList object, with order of precedence: trended dispersions, common dispersion, a constant value of 0.05.

  • abundance_trend (bool) – whether to allow an abundance-dependent trend when estimating the prior values for the quasi-likelihood multiplicative dispersion parameter.

  • robust (bool) – whether to estimate the prior QL dispersion distribution robustly

  • winsor_tail_p (pair of floats) – pair of floats giving the proportion to trim (Winsorize) from lower and upper tail of the distribution of genewise deviances when estimating the hyperparameters. Positive values produce robust empirical Bayes ignoring outlier small or large deviances. Only used when robust=True.

Returns:

object with the same components as produced by glmFit(), plus:

  • df_residual_zeros, an array containing the number of effective residual degrees of freedom for each gene, taking into account any treatment groups with all zero counts.

  • df_prior, a float (if robust=False) or array (if robust=True), giving the prior degrees of freedom for the QL dispersions.

  • var_prior, a float (if robust=False) or array (if robust=True), giving the location of the prior distribution for the QL dispersions.

  • var_post, an array containing the posterior empirical Bayes QL dispersions.

Return type:

DGEGLM