limma powers differential expression analyses for RNA-sequencing and microarray studies

limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

[1]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[2]  C. Morris Parametric Empirical Bayes Inference: Theory and Applications , 1983 .

[3]  D. Altman,et al.  STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT , 1986, The Lancet.

[4]  J M Bland,et al.  Statistical methods for assessing agreement between two methods of clinical measurement , 1986 .

[5]  Acknowledgements , 1992, Experimental Gerontology.

[6]  William S. Cleveland,et al.  Visualizing Data , 1993 .

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[9]  M. Soller,et al.  A whole genome scan for quantitative trait loci affecting milk protein percentage in Israeli-Holstein cattle, by means of selective milk DNA pooling in a daughter design, using an adjusted false discovery rate criterion. , 2001, Genetics.

[10]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.

[11]  Michael L. Bittner,et al.  Microarrays: Optical Technologies and Informatics , 2001 .

[12]  Thomas Seidl,et al.  Changes in gene expression profiles in developing B cells of murine bone marrow. , 2002, Genome research.

[13]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[14]  Charles L. Kooperberg,et al.  Improved Background Correction for Spotted DNA Microarrays , 2002, J. Comput. Biol..

[15]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[16]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[17]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[18]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[19]  D. R. Goldstein,et al.  Science and Statistics: A Festschrift for Terry Speed , 2003 .

[20]  M. Hofker Faculty Opinions recommendation of PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. , 2003 .

[21]  Yee Hwa Yang,et al.  Normalization for two-color cDNA microarray data , 2003 .

[22]  M. Ritchie Quantitative quality control and background correction for two-colour microarray data , 2004 .

[23]  Gordon K. Smyth,et al.  limmaGUI: A graphical user interface for linear modeling of microarray data , 2004, Bioinform..

[24]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[25]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[26]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[27]  Øyvind Langsrud,et al.  Rotation tests , 2005, Stat. Comput..

[28]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[29]  Gordon K Smyth,et al.  Identification and functional significance of genes regulated by structurally different histone deacetylase inhibitors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Yipeng Wang,et al.  WebArray: an online platform for microarray data analysis , 2005, BMC Bioinformatics.

[31]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[32]  Beate Sick,et al.  RACE: Remote Analysis Computation for gene Expression data , 2005, Nucleic Acids Res..

[33]  Gordon K. Smyth,et al.  Empirical array quality weights in the analysis of microarray data , 2006, BMC Bioinformatics.

[34]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Robert Gentleman,et al.  Reproducible Research: A Bioinformatics Case Study , 2005, Statistical applications in genetics and molecular biology.

[36]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[37]  Gordon K. Smyth,et al.  affylmGUI: a graphical user interface for linear modeling of single channel microarray data , 2006, Bioinform..

[38]  Nicolas Servant,et al.  Goulphar: rapid access and expertise for standard two-color microarray normalization methods , 2006, BMC Bioinformatics.

[39]  Zlatko Trajanoski,et al.  CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis , 2006, Nucleic Acids Res..

[40]  Alicia Oshlack,et al.  Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes , 2007, Genome Biology.

[41]  Dan Nettleton,et al.  Estimating the number of true null hypotheses from a histogram of p values , 2006 .

[42]  Mario Medvedovic,et al.  Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments , 2006, BMC Bioinformatics.

[43]  Hubert Rehrauer,et al.  MAGMA: analysis of two-channel microarrays made easy , 2007, Nucleic Acids Res..

[44]  Matthew E Ritchie,et al.  Integrative analysis of RUNX1 downstream pathways and target genes , 2008, BMC Genomics.

[45]  Mariana L. Neves,et al.  Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite , 2007, Nucleic Acids Res..

[46]  Gordon K. Smyth,et al.  A comparison of background correction methods for two-colour microarrays , 2007, Bioinform..

[47]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[48]  Ning Zhang,et al.  Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics , 2008, BMC Bioinformatics.

[49]  C. Mayer,et al.  NuGO contributions to GenePattern , 2008, Genes & Nutrition.

[50]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[51]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[52]  G. Smyth,et al.  Microarray background correction: maximum likelihood estimation for the normal–exponential convolution , 2008, Biostatistics.

[53]  Gordon K. Smyth,et al.  Testing significance relative to a fold-change threshold is a TREAT , 2009, Bioinform..

[54]  R. Uibo,et al.  Aire-Deficient C57BL/6 Mice Mimicking the Common Human 13-Base Pair Deletion Mutation Present with Only a Mild Autoimmune Phenotype1 , 2009, The Journal of Immunology.

[55]  S. Fox,et al.  Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers , 2009, Nature Medicine.

[56]  Wei Shi,et al.  Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips , 2010, Nucleic acids research.

[57]  J. Visvader,et al.  Control of mammary stem cell function by steroid hormone signalling , 2010, Nature.

[58]  Wei Shi,et al.  Estimating the proportion of microarray probes expressed in an RNA sample , 2010, Nucleic acids research.

[59]  Belinda Phipson,et al.  Opposing roles of polycomb repressive complexes in hematopoietic stem and progenitor cells. , 2010, Blood.

[60]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[61]  Elgene Lim,et al.  Open Access Research Article Transcriptome Analyses of Mouse and Human Mammary Cell Subpopulations Reveal Multiple Conserved Genes and Pathways , 2022 .

[62]  Heiko A. Mannsperger,et al.  RPPanalyzer: Analysis of reverse-phase protein array data , 2010, Bioinform..

[63]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[64]  Di Wu,et al.  ROAST: rotation gene set tests for complex microarray experiments , 2010, Bioinform..

[65]  Maria Teresa Dell'Anno,et al.  Direct generation of functional dopaminergic neurons from mouse and human fibroblasts , 2011, Nature.

[66]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[67]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[68]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[69]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[70]  Tim Beißbarth,et al.  Detection of Simultaneous Group Effects in MicroRNA Expression and Related Target Gene Sets , 2012, PloS one.

[71]  Gordon K. Smyth,et al.  Separate-channel analysis of two-channel microarrays: recovering inter-spot information , 2013, BMC Bioinformatics.

[72]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[73]  Belinda Phipson,et al.  Empirical Bayes in the presence of exceptional cases, with application to microarray data , 2013 .

[74]  Jarny Choi Guide: a desktop application for analysing gene expression data , 2013, BMC Genomics.

[75]  Gordon K Smyth,et al.  The use of miRNA microarrays for the analysis of cancer samples with global miRNA decrease , 2013, RNA.

[76]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[77]  J S Liu,et al.  Gene-expression data integration to squamous cell lung cancer subtypes reveals drug sensitivity , 2013, British Journal of Cancer.

[78]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[79]  Wei Shi,et al.  featureCounts: an efficient general-purpose read summarization program , 2013 .

[80]  Belinda Phipson Empirical bayes modelling of expression profiles and their associations , 2013 .

[81]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[82]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[83]  Nuno A. Fonseca,et al.  Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments , 2013, Nucleic Acids Res..

[84]  Christopher Ricks,et al.  To J.S. , 2014 .

[85]  Pax5 loss imposes a reversible differentiation block in B-progenitor acute lymphoblastic leukemia. , 2014, Genes & development.

[86]  A. Oshlack,et al.  DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging , 2014, bioRxiv.

[87]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[88]  Aaron T. L. Lun,et al.  De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly , 2014, Nucleic acids research.

[89]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[90]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[91]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.