inmoose.edgepy.glmFit

inmoose.edgepy.glmFit(y, design=None, dispersion=None, offset=None, lib_size=None, weights=None, prior_count=0.125, start=None)

Fit a negative binomial generalized log-linear model to the read counts for each gene. Conduct genewise statistical tests for a given coefficient or coefficient contrast.

This function implements one of the GLM methods developed by [McCarthy2012].

glmFit fits genewise negative binomial GLMs, all with the same design matrix but possibly different dispersions, offsets and weights. When the design matrix defines a one-way layout, or can be re-parameterized to a one-way layout, the GLMs are fitting very quickly using mglmOneGroup(). Otherwise the default fitting method, implemented in mglmLevenberg(), uses a Fisher scoring algorithm with Levenberg-style damping.

Positive prior_count cause the returned coefficients to be shrunk in such a way that fold-changes between the treatment conditions are decreased. In particular, infinite fold-changes are avoided. Larger values cause more shrinkage. The returned coefficients are affected but not the likelihood ratio tests or p-values.

See also

mglmOneGroup

low-level computations

mglmLevenberg

low-level computations

Parameters:
  • y (pd.DataFrame) – matrix of counts

  • design (matrix, optional) – design matrix for the genewise linear models. Must be of full column rank. Defaults to a single column of ones, equivalent to treating the columns as replicate libraries.

  • dispersion (float or array_like) – scalar, vector or matrix of negative binomial dispersions. Can be a common value for all genes, a vector of dispersion values with one for each gene, or a matrix of dispersion values with one for each observation.

  • offset (float or array_like, optional) – matrix of the same shape as y giving offsets for the log-linear models. Can be a scalar or a vector of length y.shape[1], in which case it is broadcasted to the shape of y.

  • lib_size (array_like, optional) – vector of length y.shape[1] giving library sizes. Only used if offset=None, in which case offset is set to log(lib_size). Defaults to colSums(y).

  • weights (matrix, optional) – prior weights for the observations (for each library and gene) to be used in the GLM calculations

  • prior_count (float) – average prior count to be added to observation to shrink the estimated log-fold-change towards zero.

  • start (matrix, optional) – initial estimates for the linear model coefficients

Returns:

object containing:

  • counts, the input matrix of counts

  • design, the input design matrix

  • weights, the input weights matrix

  • offset, matrix of linear model offsets

  • dispersion, vector of dispersions used for the fit

  • coefficients, matrix of estimated coefficients from the GLM fits, on the natural log scale, of size y.shape[0] by design.shape[1].

  • unshrunk_coefficients, matrix of estimated coefficients from the GLM fits when no log-fold-changes shrinkage is applied, on the natural log scale, of size y.shape[0] by design.shape[1]. It exists only when prior_count is not 0.

  • fitted_values, matrix of fitted values from GLM fits, same shape as y

  • deviance, numeric vector of deviances, one for each gene

Return type:

DGEGLM