Differential expression analysis of RNA-seq data at single-base resolution

In derfinder, we fit linear models (as specified by equation (3.1) in the main text) at each base in the genome. To do this, we use methods for estimating regularized linear contrasts as implemented in the limma Bioconductor package (Smyth and others 2004, Smyth 2005). We use a customized version of the lmFit function, keeping the default parameters. For the two-group comparison presented in the manuscript, the test statistic s(l) is a moderated t-statistic, which is similar to the ordinary t-statistic obtained from testing whether β2(l) = 0, but the standard error estimate for β2(l) used it its calculation is shrunk toward a prior variance estimate. This framework allows for the borrowing of information across bases, which makes the statistical results more reliable

[1]  Michal J. Okoniewski,et al.  rnaSeqMap: a Bioconductor package for RNA sequencing data exploration , 2011, BMC Bioinformatics.

[2]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[3]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[4]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[5]  S. Dudoit,et al.  Multiple Testing Procedures with Applications to Genomics , 2007 .

[6]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[7]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[8]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[9]  S. Young,et al.  On adjusting P-values for multiplicity. Response , 1993 .

[10]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[11]  Thomas E. Nichols,et al.  Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate , 2002, NeuroImage.

[12]  Paulo P. Amaral,et al.  The Reality of Pervasive Transcription , 2011, PLoS biology.

[13]  Tao Jiang,et al.  IsoLasso: A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly - (Extended Abstract) , 2011, RECOMB.

[14]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[15]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[16]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[17]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[18]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[20]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[21]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[22]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[23]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[24]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[25]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[26]  Gunnar Rätsch,et al.  Statistical Tests for Detecting Differential RNA-Transcript Expression from Read Counts , 2010, ISMB 2011.

[27]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[28]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[29]  Jeffrey T. Leek,et al.  Statistical Applications in Genetics and Molecular Biology The Joint Null Criterion for Multiple Hypothesis Tests , 2011 .

[30]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[31]  David J. Spiegelhalter,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[32]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[33]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[34]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[35]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature biotechnology.

[36]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[37]  Jeffrey T Leek,et al.  Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. , 2012, International journal of epidemiology.

[38]  T. Tatusova,et al.  Solving the Problem: Genome Annotation Standards before the Data Deluge , 2011, Standards in genomic sciences.

[39]  K. Hansen,et al.  Sequencing technology does not eliminate biological variability , 2011, Nature Biotechnology.

[40]  Xuegong Zhang,et al.  Identifying differentially spliced genes from two groups of RNA-seq samples. , 2013, Gene.

[41]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.