inmoose.edgepy.dispCoxReid
- inmoose.edgepy.dispCoxReid(y, design=None, offset=None, weights=None, AveLogCPM=None, interval=(0, 4), tol=1e-05, min_row_sum=5, subset=10000)
Estimate a common dispersion parameter across multiple negative binomial GLMs, by maximizing the Cox-Reid adjusted profile likelihood.
This is a low-level function called by
estimateGLMCommonDisp().Estimation is done by maximizing the Cox-Reid adjusted profile likelihood (Cox and Reid, 1987 [1]), through
scipy.optimize.minimize_scalar().Robinson and Smyth (2008) [2] and McCarthy et al. (2012) [3] showed that the Pearson (pseudo-likelihood) estimator typically under-estimates the true dispersion. It can be seriously biased when the number of libraries is small. On the other hand, the deviance (quasi-likelihood) estimator typically over-estimates the true dispersion when the number of libraries is small. Robinson and Smyth (2008) [2] and McCarthy et al. (2012) [3] showed the Cox-Reid estimator to be the least biased of the three options.
- Parameters:
y (matrix) – matrix of counts. A GLM is fitted to each row
design (matrix, optional) – design matrix, as in
glmFit()offset (array_like, optional) – vector or matrix of offsets for the log-linear models, as in
glmFit(). Defaults tolog(colSums(y)).weights (matrix, optional) – observation weights
AveLogCPM (array_like, optional) – vector giving average log2 counts per million
interval (tuple, optional) – pair giving minimum and maximum allowed values for the dispersion, passed to
scipy.optimize.minimize_scalar()tol (float, optional) – the desired accuracy, see
scipy.optimize.minimize_scalar()min_row_sum (int, optional) – only rows with at least this number of counts are used
subset (int, optional) – number of rows to use in the calculation. Rows used are chosen evenly space by
AveLogCPM.
- Returns:
the estimated common dispersion
- Return type:
float
References