inmoose.edgepy.DGEList

class inmoose.edgepy.DGEList(counts, lib_size=None, norm_factors=None, samples=None, group=None, group_col='group', genes=None, remove_zeroes=False)

A class for storing read counts and associated information from digital gene expression or sequencing technologies.

counts

matrix of read counts, one row per gene and one column per sample

Type:: pd.DataFrame

samples

dataframe with a row for each sample and columns group, lib_size and norm_factors containing the group labels, library sizes and normalization factors. Other columns can be optionally added to give more detailed sample information.

Type:: pd.DataFrame

common_dispersion

the overall dispersion estimate

Type:: float, optional

tagwise_dispersion

genewise dispersion estimates for each gene (“tag” and “gene” are synonymous here)

Type:: ndarray, optional

trended_dispersion

trended dispersion estimates for each gene

Type:: ndarray, optional

offset

matrix of same shape as counts giving offsets for log-linear models

Type:: array_like, optional

genes

annotation information for each gene. Same number of rows as counts

Type:: DataFrame, optional

AveLogCPM

average log2 counts per million for each gene

Type:: ndarray, optional

__init__(counts, lib_size=None, norm_factors=None, samples=None, group=None, group_col='group', genes=None, remove_zeroes=False)

Construct DGEList object from components with some checking

Parameters:

counts (array_like or pd.DataFrame) – matrix of counts
lib_size (array_like, optional) – vector of total counts (sequence depth) for each library
norm_factors (array_like, optional) – vector of normalization factors that modify the library sizes
samples (pd.DataFrame, optional) – information for each sample
group (array_like or Factor, optional) – vector or factor giving the experimental group/condition for each sample/library
group_col (str) – the name of the column containing the group information in samples. only used if group is not None
genes (pd.DataFrame, optional) – annotation information for each gene
remove_zeroes (bool) – whether to remove rows that have 0 total count

Methods

`__init__`(counts[, lib_size, norm_factors, ...])	Construct DGEList object from components with some checking
`aveLogCPM`([normalized_lib_sizes, ...])	Compute average log2 counts per million for each row of counts.
`estimateGLMCommonDisp`([design, method, ...])	Estimate a common negative binomial dispersion parameter for a DGE dataset with a general experimental design.
`estimateGLMTagwiseDisp`([design, prior_df, ...])	Compute an empirical Bayes estimate of the negative binomial dispersion parameter for each tag, with expression levels specified by a log-linear model.
`getDispersion`()	Get most complex dispersion values from DGEList object
`getOffset`()	Extract offset vector or matrix from data object and optional arguments.
`glmFit`([design, dispersion, prior_count, start])	Fit a negative binomial generalized log-linear model to the read counts for each gene.
`glmQLFit`([design, dispersion, ...])	Fit a quasi-likelihood negative binomial generalized log-linear model to count data.
`predFC`(design[, prior_count, offset, ...])	Compute estimated coefficients for a negative binomial GLM in such a way that the log-fold-changes are shrunk towards zero.
`splitIntoGroups`()	Split the counts according to group