limma
This module is a partial port in Python of the R Bioconductor limma package.
Introduction
limma is a package for the analysis of gene expression microarray data,
especially the use of linear models for analysing designed experiments and the
assessment of differential expression. limma provides the ability to
analyse comparisons between many RNA targets simultaneously in arbitrarily
complicated designed experiments. Empirical Bayesian methods are used to provide
stable results even when the number of arrays is small. The linear model and
differential expression functions apply to all gene expression technologies,
including microarrays, RNA-Seq and quantitative PCR.
There are three types of documentation available:
the LIMMA User’s Guide can be reached through the “User Guides and Package Vignettes” links at the top of the LIMMA contents page.
an overview of limma functions grouped by purpose is contained in the chapters of the present page
the Code documentation section gives an alphabetical index of detailed help topics
Classes Defined in this Module
This module defines the following data classes:
MArrayLM: store the result of fitting gene-wise linear models to the normalized intensities or log-ratios. Usually created bylmFit(). Objects of this class normally contain only one row for each unique probe.TestResults: store the results of testing a set of contrasts equal to zero for each probe. Usually created bydecideTests(). Objects of this class normally contain one row for each unique probe.
Linear Models for Microarrays
This section gives an overview of the LIMMA functions available to fit linear models and to interpret the results. This section covers models for two color arrays in terms of log-ratios or for single-channel arrays in terms of log-intensities. If you wish to fit models to the individual channel log-intensities from two color arrays, see Individual Channel Analysis of Two-Color Microarrays.
The core of this module is the fitting of gene-wise linear models to microarray data. The basic idea is to estimate log-ratios between two or more target RNA samples simultaneously. See the LIMMA User’s Guide for several case studies.
Fitting Models
The main function for model fitting is lmFit(). This is the recommended
interface for most users. lmFit() produces a fitted model object of class
MArrayLM containing coefficients, standard errors and residual standard
errors for each gene. lmFit() calls one of the following three functions
to do the actual computations:
lm_series()Straightforward least squares fitting of a linear model for each gene.mrlm()An alternative tolm_series()using robust regression as implemented by therlmfunction in the MASS package.gls_series()Generalized least squares taking into account correlations between duplicate spots (i.e. replicate spots on the same array) or related arrays. The functionduplicateCorrelation()is used to estimate the inter-duplicate or inter-block correlation before usinggls_series().
All the function which fit linear models use getEAWP() to extract data
from microarray data objects, and unwrapdups() which provides a unified
method for handling duplicate spots.
Forming the Design Matrix
lmFit() has two main arguments: the expression data and the design matrix.
The design matrix is essentially an indicator matrix which specifies which
target RNA samples were applied to each channel on each array. There is
considerable freedom in choosing the design matrix – there is always more than
one choice which is correct provided it is interpreted correctly.
Design matrices for Affymetrix or single-color arrays can be created using the
function patsy.dmatrix(). The function modelMatrix() is provided to
assist with the creation of an appropriate design matrix for two-color
microarray experiments. For direct two-color designs, without a common
reference, the design matrix often needs to be created by hand.
Making Comparisons of Interest
Once a linear model has been fit using an appropriate design matrix, the
function makeContrasts() may be used to form a contrast matrix to make
comparisons of interest. The fit and the contrast matrix are used by
contrasts_fit() to compute fold changes and t-statistics for the
contrasts of interest. This is a way to compute all possible pairwise
comparisons between treatments, for example in an experiment which compares many
treatments to a common reference.
Assessing Differential Expression
After fitting a linear model, the standard errors are moderated using a simple
empirical Bayes model using eBayes() or treat(). A moderated
t-statistic and a log-odds of differential expression is computed for each
contrast for each gene. treat() tests whether log-fold-changes are greater
than a threshold rather than merely different to zero.
eBayes() and treat() use internal functions squeezeVar(),
fitFDist(), tmixture_matrix() and tmixture_vector().
Summarizing Model Fits
After the above steps, the results may be displayed or further processed using:
topTable()Presents a list of the genes most likely to be differentially expressed for a given contrast or set of contrasts.topTableF()Presents a list of the genes most likely to be differentially expressed for a given set of contrasts. Equivalent totopTable()withcoefset to all the coefficients,coef=range(fit.shape[1]).volcanoplot()Volcano plot of fold change versus the B-statistic for any fitted coefficient.plotlines()Plots fitted coefficients or log-intensity values for time-course data.genas()Estimates biological correlation between two coefficients.write_fit()Writes aMArrayLMobject to a file. Note that iffitis aMArrayLMobject, eitherwrite_fit()orwrite_table()can be used to write the results to a delimited text file.
For multiple testing functions which operate on linear model fits, see Hypothesis Testing for Linear Models.
Model Selection
selectModel() provides a means to choose between alternative linear models
using AIC or BIC information criteria.
Individual Channel Analysis of Two-Color Microarrays
This section gives an overview of the LIMMA functions fit linear models to two-color microarray data in terms of the log-intensities rather than log-ratios.
The function intrapotCorrelation() estimates the intra-spot correlation
between the two channels. The regression function lmscFit() takes the
correlation as an argument and fits linear models to the two-color data in terms
of the individual log-intensities. The output of lmscFit() is a
MArrayLM object just the same as from lmFit(), so inference
proceeds the same way as for log-ratios once the linear model is fitted. See
Linear Models for Microarrays.
The function targetsA2C() converts two-color format target data frames to
single channel format, i.e. converts from array-per-line to channel-per-line,
to facilitate the formulation of the design matrix.
Hypothesis Testing for Linear Models
LIMMA provides a number of functions for multiple testing across both contrasts
and genes. The starting point is a MArrayLM object, called fit
say, resulting from fitting a linear model and running eBayes() and,
optionally, contrasts_fit(). See Linear Models for Microarrays or
Individual Channel Analysis of Two-Color Microarrays for details.
Multiple Testing across Genes and Contrasts
The key function is decideTests(). This function writes an object of class
TestResults, which is basically a matrix of \(-1\), \(0\) or
\(1\) elements, of the same dimension as fit_coefficients,
indicating whether each coefficient is significantly different from zero. A
number of multiple testing strategies are provided. decideTests() calls
classifyTestsF() to implement the nested F-test strategy.
selectModel() chooses between linear models for each probe using AIC or
BIC criteria. This is an alternative to hypothesis testing and can choose
between non-nested models.
A number of other functions are provided to display the results of
decideTests(). The function heatDiagram() displays the results in a
heat-map style display. This allows visual comparison of the results across many
different conditions in the linear model.
The functions vennCounts() and vennDiagram() provide Venn diagrams
style summaries of the results.
Summary and show() method exists for objects of class
TestResults.
The results from decideTests() can also be included when the results of a
linear model fit are written to a file using write_fit().
References
C.W. Law, Y. Chen, W. Shi, G.K. Smyth. 2014. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15(R29). doi:10.1186/gb-2014-15-2-r29
I. Loennstedt, T.P. Speed. 2002. Replicated microarray data. Statistica Sinica 12(31-46).
J. Michaud, K.M. Simpson, R. Escher, K. Buchet-Poyau, R. Beissbarth, C. Carmichael, M.E. Ritchie, F. Schutz, P. Cannon, M. Liu, X. Shen, Y. Ito, W.H. Raskind, M.S. Horwitz, M. Osato, D.R. Turner, T.P. Speed, M. Kavallaris, G.K. Smyth, H.S. Scott. 2008. Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9(363). doi:10.1186/1471-2164-9-363
M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi, G.K. Smyth. 2015. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7). doi:10.1093/nar/gkv007
M.A. Sartor, C.R. Tomlinson, S.C. Wesselkamper, S. Sivaganesan, G.D. Leikauf, M. Medvedovic. 2006. Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 7(538). doi:10.1186/1471-2105-7-538
G.K. Smyth. 2004. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3(1). doi:10.2202/1544-6115.1027
Code documentation
|
A class to store the results of fitting gene-wise linear models to a set of microarrays |
|
A matrix-based class for storing the results of simultaneous tests. |
|
For each gene, classify a series of related t-statistics as significantly up or down using nested F-tests. |
|
Compute contrasts from linear model fit |
|
Empirical Bayes statistics for differential expression |
|
Moment estimation of the parameters of a scaled F-distribution given one of the degrees of freedom. |
|
Fit linear models for each gene given a series of arrays |
|
Fit linear model to microarray data by ordinary least squares |
|
Construct the contrast matrix corresponding to specified contrasts of a set of parameters. |
|
Squeeze a set of sample variances together by computing empirical Bayes posterior means This function implements empirical Bayes algorithms proposed by [Smyth2004] and [Phipson2016]. |
|
Extract a table of the top-ranked genes from a linear model fit |
|
Estimate scale factor in mixture of t-distributions |
|
Estimate scale factor in mixture of t-distributions |