inmoose.edgepy.glmQLFit
- inmoose.edgepy.glmQLFit(y, design=None, dispersion=None, offset=None, lib_size=None, weights=None, abundance_trend=True, AveLogCPM=None, robust=False, winsor_tail_p=(0.05, 0.1))
Fit a quasi-likelihood negative binomial generalized log-linear model to count data.
Implement one of the quasi-likelihood (QL) methods of [Lund2012], with some enhancements and with slightly different glm, trend and FDR methods. See [Lun2016] or [Chen2016] for tutorials describing the use of
glmQLFit()andglmQLFTest()as part of a complete pipeline. Another case study usingglmQLFit()andglmQLFTest()is given in Section 4.7 of the edgeR User’s Guide.glmQLFit()is similar toglmFit()except that it also estimates QL dispersion values. It calls the limma functionsqueezeVar()to conduct empirical Bayes moderation of the genewise QL dispersions. Ifrobust=True, then the robust hyperparameter estimation features ofsqueezeVar()are used [Phipson2016]. Ifabundance_trend=True, then a prior trend is estimated based on the average logCPMs.glmQLFit()gives special attention to handling of zero counts, and in particular to situations when fitted values of zero provide no useful residual degrees of freedom for estimating the QL dispersion [Lun2017]. The usual residual degrees of freedom are returned asdf_residualwhile the adjusted residual degrees of freedom are returned asdf_residual_zeros.Note
The negative binomial dispersions
dispersionsupplied toglmQLFit()andglmQLFTest()must be based on a global model, that is, they must be either trended or common dispersions. It is not correct to supply genewise dispersions becauseglmQLFTest()estimates genewise variability using the QL dispersion.- Parameters:
y (matrix) – a matrix of counts
design (matrix, optional) – design matrix for the genewise linear models
dispersion (float or array_like) – scalar, vector or matrix of negative binomial dispersions. If
None, it will be extracted from theDGEListobject, with order of precedence: trended dispersions, common dispersion, a constant value of 0.05.abundance_trend (bool) – whether to allow an abundance-dependent trend when estimating the prior values for the quasi-likelihood multiplicative dispersion parameter.
robust (bool) – whether to estimate the prior QL dispersion distribution robustly
winsor_tail_p (pair of floats) – pair of floats giving the proportion to trim (Winsorize) from lower and upper tail of the distribution of genewise deviances when estimating the hyperparameters. Positive values produce robust empirical Bayes ignoring outlier small or large deviances. Only used when
robust=True.
- Returns:
object with the same components as produced by
glmFit(), plus:df_residual_zeros, an array containing the number of effective residual degrees of freedom for each gene, taking into account any treatment groups with all zero counts.df_prior, a float (ifrobust=False) or array (ifrobust=True), giving the prior degrees of freedom for the QL dispersions.var_prior, a float (ifrobust=False) or array (ifrobust=True), giving the location of the prior distribution for the QL dispersions.var_post, an array containing the posterior empirical Bayes QL dispersions.
- Return type:
DGEGLM