inmoose.edgepy.dispCoxReid

inmoose.edgepy.dispCoxReid(y, design=None, offset=None, weights=None, AveLogCPM=None, interval=(0, 4), tol=1e-05, min_row_sum=5, subset=10000)

Estimate a common dispersion parameter across multiple negative binomial GLMs, by maximizing the Cox-Reid adjusted profile likelihood.

This is a low-level function called by estimateGLMCommonDisp().

Estimation is done by maximizing the Cox-Reid adjusted profile likelihood (Cox and Reid, 1987 [1]), through scipy.optimize.minimize_scalar().

Robinson and Smyth (2008) [2] and McCarthy et al. (2012) [3] showed that the Pearson (pseudo-likelihood) estimator typically under-estimates the true dispersion. It can be seriously biased when the number of libraries is small. On the other hand, the deviance (quasi-likelihood) estimator typically over-estimates the true dispersion when the number of libraries is small. Robinson and Smyth (2008) [2] and McCarthy et al. (2012) [3] showed the Cox-Reid estimator to be the least biased of the three options.

Parameters:

y (matrix) – matrix of counts. A GLM is fitted to each row
design (matrix, optional) – design matrix, as in glmFit()
offset (array_like, optional) – vector or matrix of offsets for the log-linear models, as in glmFit(). Defaults to log(colSums(y)).
weights (matrix, optional) – observation weights
AveLogCPM (array_like, optional) – vector giving average log2 counts per million
interval (tuple, optional) – pair giving minimum and maximum allowed values for the dispersion, passed to scipy.optimize.minimize_scalar()
tol (float, optional) – the desired accuracy, see scipy.optimize.minimize_scalar()
min_row_sum (int, optional) – only rows with at least this number of counts are used
subset (int, optional) – number of rows to use in the calculation. Rows used are chosen evenly space by AveLogCPM.

Returns:

the estimated common dispersion

Return type:

float

References