inmoose.edgepy.DGEList

class inmoose.edgepy.DGEList(counts, lib_size=None, norm_factors=None, samples=None, group=None, group_col='group', genes=None, remove_zeroes=False)

A class for storing read counts and associated information from digital gene expression or sequencing technologies.

counts

matrix of read counts, one row per gene and one column per sample

Type:

pd.DataFrame

samples

dataframe with a row for each sample and columns group, lib_size and norm_factors containing the group labels, library sizes and normalization factors. Other columns can be optionally added to give more detailed sample information.

Type:

pd.DataFrame

common_dispersion

the overall dispersion estimate

Type:

float, optional

tagwise_dispersion

genewise dispersion estimates for each gene (“tag” and “gene” are synonymous here)

Type:

ndarray, optional

trended_dispersion

trended dispersion estimates for each gene

Type:

ndarray, optional

offset

matrix of same shape as counts giving offsets for log-linear models

Type:

array_like, optional

genes

annotation information for each gene. Same number of rows as counts

Type:

DataFrame, optional

AveLogCPM

average log2 counts per million for each gene

Type:

ndarray, optional

__init__(counts, lib_size=None, norm_factors=None, samples=None, group=None, group_col='group', genes=None, remove_zeroes=False)

Construct DGEList object from components with some checking

Parameters:
  • counts (array_like or pd.DataFrame) – matrix of counts

  • lib_size (array_like, optional) – vector of total counts (sequence depth) for each library

  • norm_factors (array_like, optional) – vector of normalization factors that modify the library sizes

  • samples (pd.DataFrame, optional) – information for each sample

  • group (array_like or Factor, optional) – vector or factor giving the experimental group/condition for each sample/library

  • group_col (str) – the name of the column containing the group information in samples. only used if group is not None

  • genes (pd.DataFrame, optional) – annotation information for each gene

  • remove_zeroes (bool) – whether to remove rows that have 0 total count

Methods

__init__(counts[, lib_size, norm_factors, ...])

Construct DGEList object from components with some checking

aveLogCPM([normalized_lib_sizes, ...])

Compute average log2 counts per million for each row of counts.

estimateGLMCommonDisp([design, method, ...])

Estimate a common negative binomial dispersion parameter for a DGE dataset with a general experimental design.

estimateGLMTagwiseDisp([design, prior_df, ...])

Compute an empirical Bayes estimate of the negative binomial dispersion parameter for each tag, with expression levels specified by a log-linear model.

getDispersion()

Get most complex dispersion values from DGEList object

getOffset()

Extract offset vector or matrix from data object and optional arguments.

glmFit([design, dispersion, prior_count, start])

Fit a negative binomial generalized log-linear model to the read counts for each gene.

glmQLFit([design, dispersion, ...])

Fit a quasi-likelihood negative binomial generalized log-linear model to count data.

predFC(design[, prior_count, offset, ...])

Compute estimated coefficients for a negative binomial GLM in such a way that the log-fold-changes are shrunk towards zero.

splitIntoGroups()

Split the counts according to group