inmoose.edgepy.dispCoxReidInterpolateTagwise

inmoose.edgepy.dispCoxReidInterpolateTagwise(y, design, dispersion, offset=None, trend=True, AveLogCPM=None, min_row_sum=5, prior_df=10, span=0.3, grid_npts=11, grid_range=(-6, 6), weights=None)

Estimate genewise dispersion parameters across multiple negative binomial GLMs using weighted Cox-Reid adjusted profile likelihood and cubic spline interpolation over a genewise grid.

In the context of edgepy, dispCoxReidInterpolateTagwise() is a low-level function called by estimateGLMTagwiseDisp().

This function calls maximizeInterpolant() to fit cubic spline interpolation over a genewise grid.

Note that the terms “tag” and “gene” are synonymous here. The function is only named “tagwise” for historical reasons.

Parameters:
  • y (matrix) – matrix of counts

  • design (matrix) – design matrix for the GLM to fit

  • dispersion (float or array_like) – scalar or vector giving the dispersion(s) towards which the genewise dispersion parameters are shrunk

  • offset (float or array_like, optional) – scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value is used as an offset for all genes and libraries. If a vector, it should have length equal to the number of libraries, and the same vector of offsets is used for each gene. If a matrix, then each library for each gene has its unique offset. In adjustedProfileLik() the offset must be a matrix with the same shape as the matrix of counts.

  • trend (bool, optional) – whether abundance-dispersion trend is used for smoothing

  • AveLogCPM (array_like, optional) – vector of average log2 counts per million for each gene

  • min_row_sum (int, optional) – value to filter out low abundance genes. Only genes with total sum of counts above this threshold are used. Low abundance genes can adversely affect the estimation of the common dispersion, so this argument allows the user to select an appropriate filter threshold for gene abundance. Defaults to 5.

  • prior_df (float, optional) – prior desmoothing parameter that indicates the weight to give to the common likelihood compared to the individual gene’s likelihood; default getPriorN(obj) gives a value for prior_n that is equivalent to giving the common likelihood 20 prior degrees of freedom in the estimation of the genewise dispersion.

  • span (float, optional) – parameter between 0 and 1 specifying proportion of data to be used in the local regression moving window. Larger values give smoother fits.

  • grid_npts (int, optional) – the number of points at which to place knots for the spline-based estimation of the genewise dispersion estimates.

  • grid_range (tuple, optional) – relative range, in terms of log2(dispersion), on either side of trendline for each gene for spline grid points.

  • weights (matrix, optional) – observation weights

Returns:

vector of genewise dispersion, same length as the number of genes in the input matrix of counts

Return type:

ndarray