inmoose.edgepy.topTags

inmoose.edgepy.topTags(self, n=10, adjust_method='fdr_bh', sort_by='PValue', p_value=1)

Extract the most differentially expressed genes (or sequence tags) from a test object, ranked either by p-value or by absolute log-fold-change.

This function accepts a test statistic object created by any of the functions exactTest(), glmLRT(), glmTreat() or glmQLFTest() and extracts a readable dataframe of the most differentially expressed genes. The dataframe collates the annotation and differential expression statistics for the top genes. The dataframe is wrapped in a TopTags object that records the test statistic used and the multiple testing adjustment method.

topTags() permits ranking by fold-change but the authors do not recommend fold-change ranking or fold-change cutoffs for routine RNA-Seq analysis. The p-value ranking is intended to be more biologically meaningful, especially if the p-values were computed using glmTreat().

Parameters:
  • self (DGEExact or DGELRT) – object containing test statistics and p-values

  • n (int) – maximum number of genes/tags to return

  • adjust_method (str) – specify the method used to adjust p-values for multiple testing. See statsmodels.stats.multitest() for possible values. Also accepts the values accepted by p.adjust from the stats package.

  • sort_by ({"PValue, "logFC", "none"}) –

    specify the sort method

    • "PValue" to sort by p-value

    • "logFC" to sort by absolute log-fold-change

    • "none" for no sorting

  • p_value (float) – cutoff value for adjusted p-values. Only tags with adjusted p-values equal or lower than specified are returned.

Returns:

a dataframe containing differential expression results for the top genes in a sorted order. The number of rows is the smaller of n and the number of genes with adjusted p-value less than or equal to p_value. The dataframe includes all the annotation columns from self.genes and all statistic columns from self plus one of:

  • "FDR", false discovery rate (only when adjust_method is "fdr_bh", "fdr_by"))

  • "FWER", family-wise error rate (only when adjust_method is "holm", "simes-hochberg", "hommel" or "bonferroni")

For consistency with other modules, the dataframe also contains a "adj_pvalue" column with the same content as the "FDR" or "FWER" column.

The object also contains the following components:

  • adjust_method, string specifying the method used to adjust p-values for multiple testing, same as input argument

  • comparison the names of the two groups being compared (for DGEExact objects) or the glm contrast being tested (for DGELRT objects).

  • test, string stating the name of the test

Return type:

TopTags