inmoose.edgepy.aveLogCPM

inmoose.edgepy.aveLogCPM(y, lib_size=None, offset=None, prior_count=2, dispersion=None, weights=None)

Compute average log2 counts per million for each row of counts.

This function uses mglmOneGroup() to compute average counts per million (AveCPM) for each row of counts, and returns log2(AveCPM). An average value of prior_count is added to the counts before running mglmOneGroup(). If prior_count is a vector, each entry will be added to all counts in the corresponding row of y, as described in addPriorCount().

This function is similar to log2(rowMeans(cpm(y, ...))), but with the refinement that larger library sizes are given more weight in the average. The two version will agree for large value of the dispersion.

See also

cpm

for individual logCPM values, rather than genewise averages

addPriorCount

uses the same strategy to add the prior counts

mglmOneGroup

computations for this function rely on mglmOneGroup()

Parameters:
  • y (matrix) – matrix of counts. Rows for genes and columns for libraries.

  • lib_size (array_like, optional) – vector of library sizes. Defaults to np.sum(y, axis=0). Ignored if offset is not None.

  • offset (matrix, optional) – matrix of offsets for the log-linear models. Defaults to None.

  • prior_count (float or array_like, optional) – scalar or vector of length y.shape[0], containing the average value(s) to be added to each count to avoid infinite value on the log-scale. Defaults to 2.

  • dispersion (float or array_like, optional) – scalar or vector of negative binomial dispersions.

  • weights (matrix, optional) – matrix of observation weights

Returns:

numeric vector giving log2(AveCPM) for each row of y

Return type:

ndarray