inmoose.edgepy.DGEList
- class inmoose.edgepy.DGEList(counts, lib_size=None, norm_factors=None, samples=None, group=None, group_col='group', genes=None, remove_zeroes=False)
A class for storing read counts and associated information from digital gene expression or sequencing technologies.
- counts
matrix of read counts, one row per gene and one column per sample
- Type:
pd.DataFrame
- samples
dataframe with a row for each sample and columns
group,lib_sizeandnorm_factorscontaining the group labels, library sizes and normalization factors. Other columns can be optionally added to give more detailed sample information.- Type:
pd.DataFrame
- common_dispersion
the overall dispersion estimate
- Type:
float, optional
- tagwise_dispersion
genewise dispersion estimates for each gene (“tag” and “gene” are synonymous here)
- Type:
ndarray, optional
- trended_dispersion
trended dispersion estimates for each gene
- Type:
ndarray, optional
- offset
matrix of same shape as
countsgiving offsets for log-linear models- Type:
array_like, optional
- genes
annotation information for each gene. Same number of rows as
counts- Type:
DataFrame, optional
- AveLogCPM
average log2 counts per million for each gene
- Type:
ndarray, optional
- __init__(counts, lib_size=None, norm_factors=None, samples=None, group=None, group_col='group', genes=None, remove_zeroes=False)
Construct DGEList object from components with some checking
- Parameters:
counts (array_like or pd.DataFrame) – matrix of counts
lib_size (array_like, optional) – vector of total counts (sequence depth) for each library
norm_factors (array_like, optional) – vector of normalization factors that modify the library sizes
samples (pd.DataFrame, optional) – information for each sample
group (array_like or Factor, optional) – vector or factor giving the experimental group/condition for each sample/library
group_col (str) – the name of the column containing the group information in
samples. only used ifgroupis notNonegenes (pd.DataFrame, optional) – annotation information for each gene
remove_zeroes (bool) – whether to remove rows that have 0 total count
Methods
__init__(counts[, lib_size, norm_factors, ...])Construct DGEList object from components with some checking
aveLogCPM([normalized_lib_sizes, ...])Compute average log2 counts per million for each row of counts.
estimateGLMCommonDisp([design, method, ...])Estimate a common negative binomial dispersion parameter for a DGE dataset with a general experimental design.
estimateGLMTagwiseDisp([design, prior_df, ...])Compute an empirical Bayes estimate of the negative binomial dispersion parameter for each tag, with expression levels specified by a log-linear model.
getDispersion()Get most complex dispersion values from DGEList object
getOffset()Extract offset vector or matrix from data object and optional arguments.
glmFit([design, dispersion, prior_count, start])Fit a negative binomial generalized log-linear model to the read counts for each gene.
glmQLFit([design, dispersion, ...])Fit a quasi-likelihood negative binomial generalized log-linear model to count data.
predFC(design[, prior_count, offset, ...])Compute estimated coefficients for a negative binomial GLM in such a way that the log-fold-changes are shrunk towards zero.
splitIntoGroups()Split the counts according to group