inmoose.edgepy.glmFit
- inmoose.edgepy.glmFit(y, design=None, dispersion=None, offset=None, lib_size=None, weights=None, prior_count=0.125, start=None)
Fit a negative binomial generalized log-linear model to the read counts for each gene. Conduct genewise statistical tests for a given coefficient or coefficient contrast.
This function implements one of the GLM methods developed by [McCarthy2012].
glmFitfits genewise negative binomial GLMs, all with the same design matrix but possibly different dispersions, offsets and weights. When the design matrix defines a one-way layout, or can be re-parameterized to a one-way layout, the GLMs are fitting very quickly usingmglmOneGroup(). Otherwise the default fitting method, implemented inmglmLevenberg(), uses a Fisher scoring algorithm with Levenberg-style damping.Positive
prior_countcause the returned coefficients to be shrunk in such a way that fold-changes between the treatment conditions are decreased. In particular, infinite fold-changes are avoided. Larger values cause more shrinkage. The returned coefficients are affected but not the likelihood ratio tests or p-values.See also
mglmOneGrouplow-level computations
mglmLevenberglow-level computations
- Parameters:
y (pd.DataFrame) – matrix of counts
design (matrix, optional) – design matrix for the genewise linear models. Must be of full column rank. Defaults to a single column of ones, equivalent to treating the columns as replicate libraries.
dispersion (float or array_like) – scalar, vector or matrix of negative binomial dispersions. Can be a common value for all genes, a vector of dispersion values with one for each gene, or a matrix of dispersion values with one for each observation.
offset (float or array_like, optional) – matrix of the same shape as
ygiving offsets for the log-linear models. Can be a scalar or a vector of lengthy.shape[1], in which case it is broadcasted to the shape ofy.lib_size (array_like, optional) – vector of length
y.shape[1]giving library sizes. Only used ifoffset=None, in which caseoffsetis set tolog(lib_size). Defaults tocolSums(y).weights (matrix, optional) – prior weights for the observations (for each library and gene) to be used in the GLM calculations
prior_count (float) – average prior count to be added to observation to shrink the estimated log-fold-change towards zero.
start (matrix, optional) – initial estimates for the linear model coefficients
- Returns:
object containing:
counts, the input matrix of countsdesign, the input design matrixweights, the input weights matrixoffset, matrix of linear model offsetsdispersion, vector of dispersions used for the fitcoefficients, matrix of estimated coefficients from the GLM fits, on the natural log scale, of sizey.shape[0]bydesign.shape[1].unshrunk_coefficients, matrix of estimated coefficients from the GLM fits when no log-fold-changes shrinkage is applied, on the natural log scale, of sizey.shape[0]bydesign.shape[1]. It exists only whenprior_countis not 0.fitted_values, matrix of fitted values from GLM fits, same shape asydeviance, numeric vector of deviances, one for each gene
- Return type:
DGEGLM