edgepy
This module is a partial port in Python of the R Bioconductor edgeR package.
Only the functionalities necessary to inmoose.pycombat.pycombat_seq() and
differential expression analysis have been ported so far.
Differential Expression Analysis Example
We give below an example of how to use edgepy to perform a differential
expression analysis on the pasilla dataset.
>>> from inmoose.data.pasilla import pasilla
>>> from inmoose.edgepy import DGEList, glmLRT, topTags
>>> from patsy import dmatrix
>>>
>>> # load the pasilla dataset as an AnnData
>>> pas = pasilla()
>>>
>>> # extract the count matrix and the annotation dataframe from the AnnData object
>>> counts = pas.X.T
>>> anno = pas.obs
>>> # build the design matrix
>>> design = dmatrix("~condition", data=anno)
>>>
>>> # build a DGEList object
>>> dge_list = DGEList(counts=counts, samples=anno, group_col="condition", genes=pas.var)
>>> # estimate the dispersions
>>> dge_list.estimateGLMCommonDisp(design=design)
<inmoose.edgepy.DGEList.DGEList object at 0x7fd36af01690>
>>>
>>> # fit the GLM
>>> fit = dge_list.glmFit(design=design)
>>>
>>> # run a differential expression analysis based on LRT
>>> lrt = glmLRT(fit)
>>>
>>> topTags(lrt)
Coefficient: ['condition[T.untreated]']
logFC logCPM LR PValue
gene_id
FBgn0000003 4.545505 5.865509 393.393494 1.510489e-87
FBgn0000008 4.086659 4.382091 284.242373 8.935808e-64
FBgn0000014 -2.949233 6.940623 257.991933 4.701464e-58
FBgn0000015 3.126516 8.439421 248.166909 6.517421e-56
FBgn0000017 -3.016858 4.143243 219.858071 9.712747e-50
FBgn0000018 3.439621 4.173627 218.563065 1.861317e-49
FBgn0000022 4.801665 2.531154 208.407308 3.057399e-47
FBgn0000024 -2.731362 4.811909 200.637661 1.515937e-45
FBgn0000028 -3.453657 2.469177 187.408001 1.169845e-42
FBgn0000032 3.559537 3.203302 186.943820 1.477260e-42
References
Y. Chen, A.T.L Lun, G.K. Smyth. 2016. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Research 5, 1438. doi:10.12688/f1000research.8987.2
J.D. Gibbons, J.W. Pratt. 1975. P-values: interpretation and methodology. The American Statistician 29, 20-25. doi:10.1080/00031305.1975.10479106
A.T.L. Lun, Y. Chen, G.K. Smyth. 2016. It’s DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR. Methods in Molecular Biology 1418, 391-416. doi:10.1007/978-1-4939-3578-9_19
S.P. Lund, D. Nettleton, D.J. McCarthy, G.K. Smyth. 2012. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology Volume 11, Issue 5, Article 8. doi:10.1515/1544-6115.1826
A.T.L. Lun, G.K. Smyth. 2017. No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data. Statistical Applications in Genetics and Molecular Biology 16(2), 83-93. doi:10.1515/sagmb-2017-0010
D. J. McCarthy, Y. Chen, G. K. Smyth. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. doi:10.1093/nar/gks042
B. Phipson, S. Lee, I.J. Majewski, W. S. Alexander, G.K. Smyth. 2016. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10, 946-963. doi:10.1214/16-AOAS920
M.D. Robinson, g.K. Smyth. 2008. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321-332. doi:10.1093/biostatistics/kxm030
Code documentation
|
A class for storing read counts and associated information from digital gene expression or sequencing technologies. |
|
Add a library size-adjusted prior count to each observation. |
|
Compute adjusted profile log-likelihoods for the dispersion parameters of genewise negative binomial GLMs. |
|
Compute average log2 counts per million for each row of counts. |
|
Computes p-values for differential abundance for each gene between two digital libraries, conditioning on the total counts for each gene. |
|
Convert a general design matrix into a oneway layout if that is possible. |
|
Estimate a common dispersion parameter across multiple negative binomial GLMs, by maximizing the Cox-Reid adjusted profile likelihood. |
|
Estimate genewise dispersion parameters across multiple negative binomial GLMs using weighted Cox-Reid adjusted profile likelihood and cubic spline interpolation over a genewise grid. |
|
Estimate a common negative binomial dispersion parameter for a DGE dataset with a general experimental design. |
|
Compute an empirical Bayes estimate of the negative binomial dispersion parameter for each tag, with expression levels specified by a log-linear model. |
|
Compute genewise exact tests for differences in the means between two groups of negative-binomially distributed counts. |
|
Compute genewise p-values for differences in the means between two groups of negative-binomially distributed counts. |
|
Compute genewise p-values for differences in the means between two groups of negative-binomially distributed counts. |
|
Compute genewise p-values for differences in the means between two groups of negative-binomially distributed counts. |
|
Compute genewise p-values for differences in the means between two groups of negative-binomially distributed counts. |
|
Fit a negative binomial generalized log-linear model to the read counts for each gene. |
|
Conduct genewise statistical tests for a given coefficient or coefficient contrast. |
|
Fit a quasi-likelihood negative binomial generalized log-linear model to count data. |
|
Conduct genewise statistical tests for a given coefficient or contrast |
|
Fit genewise negative binomial GLMs with log-link using Levenberg damping to ensure convergence. |
|
Fit single-group negative-binomial GLMs genewise. |
|
Fit multiple negative binomial GLMs with log-link by Fisher scoring with a single explanatory factor in the model. |
|
Moving average smoother for matrix columns. |
|
Residual deviances for row-wise negative binomial GLMs. |
|
Plot the genewise quasi-likelihood dispersion against the gene abundance (in log2 counts per million) |
|
Compute estimated coefficients for a negative binomial GLM in such a way that the log-fold-changes are shrunk towards zero. |
|
Split the counts according to group |
|
Take a systematic subset of indices stratified by a ranking variable |
|
Extract the most differentially expressed genes (or sequence tags) from a test object, ranked either by p-value or by absolute log-fold-change. |
|
Check for standard components of DGEList object |