inmoose.edgepy.mglmOneGroup

inmoose.edgepy.mglmOneGroup(y, dispersion=0, offset=0, weights=None, coef_start=None, maxit=50, tol=1e-10, verbose=False)

Fit single-group negative-binomial GLMs genewise.

This is a low-level work-horse used by higher-level functions, especially glmFit(). Most users will not need to call this function directly.

This function fits a negative binomial GLM to each row of y. The row-wise GLMs all have the same design matrix but possibly different dispersions, offsets and weights. It is low-level in that it operates on atomic objects (matrices and vectors).

This function fits an intercept only model to each response vector. In other words, it treats all the libraries as belonging to one group. It implements Fisher scoring with a score-statistic stopping criterion for each gene. It treats the dispersion parameter of the negative binomial distribution as a known input. Excellent starting values are available for the null model so this function seldom has any problems with convergence. It is used by other functions to compute the overall abundance for each gene.

Parameters:

y (array_like) – matrix of negative binomial counts. Rows for genes and columns for libraries.
dispersion (float or array_like) – scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.
offset (array_like) – vector or matrix giving the offset that is to be included in the log linear model predictor. Can be a scalar, a vector of length equal to the number of libraries, or a matrix of the same shape as y.
weights (matrix, optional) – vector or matrix of non-negative quantitative weights. Can be a vector of length equal to the number of libraries, or a matrix of the same shape as y.
coef_start (array_like, optional) – matrix of starting values for the linear model coefficient. Number of rows should agree with y and a single column. This argument does not usually need to be set as the automatic starting values perform well.
maxit (int) – the maximum number of iterations for the Fisher scoring algorithm. The iteration will be stopped when this limit is reached even if the convergence criterion has not been satisfied.
tol (float) – the convergence tolerance. Convergence if judged successful when the step size falls below tol in absolute size.

Returns:

vector of coefficients

Return type:

ndarray