Stability of methods for differential expression analysis of RNA-seq data

BackgroundAs RNA-seq becomes the assay of choice for measuring gene expression levels, differential expression analysis has received extensive attentions of researchers. To date, for the evaluation of DE methods, most attention has been paid on validity. Yet another important aspect of DE methods, stability, is overlooked and has not been studied to the best of our knowledge.ResultsIn this study, we empirically show the need of assessing stability of DE methods and propose a stability metric, called Area Under the Correlation curve (AUCOR), that generates the perturbed datasets by a mixture distribution and combines the information of similarities between sets of selected features from these perturbed datasets and the original dataset.ConclusionEmpirical results support that AUCOR can effectively rank the DE methods in terms of stability for given RNA-seq datasets. In addition, we explore how biological or technical factors from experiments and data analysis affect the stability of DE methods. AUCOR is implemented in the open-source R package AUCOR, with source code freely available at https://github.com/linbingqing/stableDE.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  R. Real,et al.  The Probabilistic Basis of Jaccard's Index of Similarity , 1996 .

[3]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[4]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[5]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[6]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[7]  Hui Xiao,et al.  Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes , 2009, Bioinform..

[8]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[9]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[10]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[11]  R. Spielman,et al.  Polymorphic Cis- and Trans-Regulation of Human Gene Expression , 2010, PLoS biology.

[12]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[13]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[14]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[15]  Daniel Bottomly,et al.  Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays , 2011, PloS one.

[16]  R. Tibshirani,et al.  Normalization, testing, and false discovery rate estimation for RNA-sequencing data. , 2012, Biostatistics.

[17]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[18]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[19]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[20]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[21]  Wolfgang Huber,et al.  Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size , 2013, Bioinform..

[22]  Mark D. Robinson,et al.  Robustly detecting differential expression in RNA sequencing data using observation weights , 2013, Nucleic acids research.

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  Li-Feng Zhang,et al.  LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data , 2014, BMC Genomics.

[25]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[26]  Paolo Frasconi,et al.  Machine Learning and Knowledge Discovery in Databases , 2016, Lecture Notes in Computer Science.

[27]  Sophie Lamarre,et al.  Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size , 2018, Front. Plant Sci..

[28]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[29]  Ashley J Waardenberg,et al.  consensusDE: an R package for assessing consensus of multiple RNA-seq algorithms with RUV correction , 2019, PeerJ.