inmoose.edgepy.addPriorCount

inmoose.edgepy.addPriorCount(y, lib_size=None, offset=None, prior_count=1)

Add a library size-adjusted prior count to each observation.

This function adds a positive prior count to each observation, often useful to avoid zeroes during calculation of log-values. For example, predFC() will call this function to calculate shrunken log-fold changes. aveLogCPM() and cpm() also use the same underlying code to calculate (average) log-counts per million.

The actual value added to the counts for each library is scaled according to the library size. This ensures that the relatives contribution of the prior is the same for each library. Otherwise, a fixed prior would have little effect on a large library, but a big effect on a small library.

The library sizes are also modified, with twice the scales prior being added to the library size for each library. To understand the motication for this, consider that each observation is, effectively, a proportion of the total count in the library. The addition scheme implemented here represents an empirical logistic transform and ensures that the proportion can never be zero or one.

If offset is supplied, this is used in favor of lib_size, where exp(offset) is defined as the vector/matrix of library sizes. If an offset matrix is supplied, this will lead to gene-specific scaling of the prior as described above.

Most use cases of this function will involve supplying a constant value to prior_count for all genes. However, it is also possible to use gene-specific values by supplying a vector of length equal to the number of rows in y.

Parameters:
  • y (matrix) – a numeric count matrix, with rows corresponding to genes and columns to libraries

  • lib_size (array, optional) – a numeric vector of library sizes

  • offset (array, optional) – a numeric vector or matrix of offsets

  • prior_count (float or array) – a constant or gene-specific vector of prior counts to be added genes

Returns:

  • ndarray – matrix of counts with the added priors

  • ndarray – the log-transformed modified library sizes