inmoose.edgepy.mglmLevenberg

inmoose.edgepy.mglmLevenberg(y, design, dispersion=0, offset=0, weights=None, coef_start=None, start_method='null', maxit=200, tol=1e-06)

Fit genewise negative binomial GLMs with log-link using Levenberg damping to ensure convergence.

This is a low-level work-horse used by higher-level functions, especially glmFit(). Most users will not need to call this function directly.

This function fits a negative binomial GLM to each row of y. The row-wise GLMs all have the same design matrix but possibly different dispersions, offsets and weights. It is low-level in that it operates on atomic objects (matrices and vectors).

This function fits an arbitrary log-linear model to each response vector. It implements a Levenberg-Marquardt modification of the GLM scoring algorithm to prevent divergence. It treats the dispersion parameter of the negative binomial distribution as a known input.

Parameters:

y (array_like) – matrix of negative binomial counts. Rows for genes and columns for libraries.
design (array_like) – design matrix of the GLM. Assumed to be full column rank
dispersion (float or array_like) – scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.
offset (array_like) – vector or matrix giving the offset that is to be included in the log linear model predictor. Can be a scalar, a vector of length equal to the number of libraries, or a matrix of the same shape as y.
weights (matrix, optional) – vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same shape as y.
coef_start (array_like, optional) – matrix of starting values for the linear model coefficient. Number of rows should agree with y and number of columns should agree with design. This argument does not usually need to be set as the automatic starting values perform well.
start_method (str) – method used to generate starting values when coef_start = None. Possible values are “null” to start from the null model of equal expression levels or y to use the data as starting value for the mean.
maxit (int) – the maximum number of iterations for the Fisher scoring algorithm. The iteration will be stopped when this limit is reached even if the convergence criterion has not been satisfied.
tol (float) – the convergence tolerance.

Returns:

tuple with the following components:

matrix of estimated coefficients for the linear models
matrix of fitted values
vector of residual deviances
number of iterations used
Boolean vector indicating genes for which the maximum damping was exceeded before convergence was achieved

Return type:

tuple