inmoose.edgepy.dispCoxReidInterpolateTagwise
- inmoose.edgepy.dispCoxReidInterpolateTagwise(y, design, dispersion, offset=None, trend=True, AveLogCPM=None, min_row_sum=5, prior_df=10, span=0.3, grid_npts=11, grid_range=(-6, 6), weights=None)
Estimate genewise dispersion parameters across multiple negative binomial GLMs using weighted Cox-Reid adjusted profile likelihood and cubic spline interpolation over a genewise grid.
In the context of
edgepy,dispCoxReidInterpolateTagwise()is a low-level function called byestimateGLMTagwiseDisp().This function calls
maximizeInterpolant()to fit cubic spline interpolation over a genewise grid.Note that the terms “tag” and “gene” are synonymous here. The function is only named “tagwise” for historical reasons.
- Parameters:
y (matrix) – matrix of counts
design (matrix) – design matrix for the GLM to fit
dispersion (float or array_like) – scalar or vector giving the dispersion(s) towards which the genewise dispersion parameters are shrunk
offset (float or array_like, optional) – scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value is used as an offset for all genes and libraries. If a vector, it should have length equal to the number of libraries, and the same vector of offsets is used for each gene. If a matrix, then each library for each gene has its unique offset. In
adjustedProfileLik()theoffsetmust be a matrix with the same shape as the matrix of counts.trend (bool, optional) – whether abundance-dispersion trend is used for smoothing
AveLogCPM (array_like, optional) – vector of average log2 counts per million for each gene
min_row_sum (int, optional) – value to filter out low abundance genes. Only genes with total sum of counts above this threshold are used. Low abundance genes can adversely affect the estimation of the common dispersion, so this argument allows the user to select an appropriate filter threshold for gene abundance. Defaults to 5.
prior_df (float, optional) – prior desmoothing parameter that indicates the weight to give to the common likelihood compared to the individual gene’s likelihood; default
getPriorN(obj)gives a value forprior_nthat is equivalent to giving the common likelihood 20 prior degrees of freedom in the estimation of the genewise dispersion.span (float, optional) – parameter between 0 and 1 specifying proportion of data to be used in the local regression moving window. Larger values give smoother fits.
grid_npts (int, optional) – the number of points at which to place knots for the spline-based estimation of the genewise dispersion estimates.
grid_range (tuple, optional) – relative range, in terms of
log2(dispersion), on either side of trendline for each gene for spline grid points.weights (matrix, optional) – observation weights
- Returns:
vector of genewise dispersion, same length as the number of genes in the input matrix of counts
- Return type:
ndarray