inmoose.edgepy.topTags

inmoose.edgepy.topTags(self, n=10, adjust_method='fdr_bh', sort_by='PValue', p_value=1)

Extract the most differentially expressed genes (or sequence tags) from a test object, ranked either by p-value or by absolute log-fold-change.

This function accepts a test statistic object created by any of the functions exactTest(), glmLRT(), glmTreat() or glmQLFTest() and extracts a readable dataframe of the most differentially expressed genes. The dataframe collates the annotation and differential expression statistics for the top genes. The dataframe is wrapped in a TopTags object that records the test statistic used and the multiple testing adjustment method.

topTags() permits ranking by fold-change but the authors do not recommend fold-change ranking or fold-change cutoffs for routine RNA-Seq analysis. The p-value ranking is intended to be more biologically meaningful, especially if the p-values were computed using glmTreat().

Parameters:

self (DGEExact or DGELRT) – object containing test statistics and p-values
n (int) – maximum number of genes/tags to return
adjust_method (str) – specify the method used to adjust p-values for multiple testing. See statsmodels.stats.multitest() for possible values. Also accepts the values accepted by p.adjust from the stats package.
sort_by ({"PValue, "logFC", "none"}) –
specify the sort method
- "PValue" to sort by p-value
- "logFC" to sort by absolute log-fold-change
- "none" for no sorting
p_value (float) – cutoff value for adjusted p-values. Only tags with adjusted p-values equal or lower than specified are returned.

Returns:

a dataframe containing differential expression results for the top genes in a sorted order. The number of rows is the smaller of n and the number of genes with adjusted p-value less than or equal to p_value. The dataframe includes all the annotation columns from self.genes and all statistic columns from self plus one of:

"FDR", false discovery rate (only when adjust_method is "fdr_bh", "fdr_by"))
"FWER", family-wise error rate (only when adjust_method is "holm", "simes-hochberg", "hommel" or "bonferroni")

For consistency with other modules, the dataframe also contains a "adj_pvalue" column with the same content as the "FDR" or "FWER" column.

The object also contains the following components:

adjust_method, string specifying the method used to adjust p-values for multiple testing, same as input argument
comparison the names of the two groups being compared (for DGEExact objects) or the glm contrast being tested (for DGELRT objects).
test, string stating the name of the test

Return type:

TopTags