RNA-Seq differential expression analysis: An extended review and a software tool

The correct identification of differentially expressed genes (DEGs) between specific conditions is a key in the understanding phenotypic variation. High-throughput transcriptome sequencing (RNA-Seq) has become the main option for these studies. Thus, the number of methods and softwares for differential expression analysis from RNA-Seq data also increased rapidly. However, there is no consensus about the most appropriate pipeline or protocol for identifying differentially expressed genes from RNA-Seq data. This work presents an extended review on the topic that includes the evaluation of six methods of mapping reads, including pseudo-alignment and quasi-mapping and nine methods of differential expression analysis from RNA-Seq data. The adopted methods were evaluated based on real RNA-Seq data, using qRT-PCR data as reference (gold-standard). As part of the results, we developed a software that performs all the analysis presented in this work, which is freely available at https://github.com/costasilvati/consexpression. The results indicated that mapping methods have minimal impact on the final DEGs analysis, considering that adopted data have an annotated reference genome. Regarding the adopted experimental model, the DEGs identification methods that have more consistent results were the limma+voom, NOIseq and DESeq2. Additionally, the consensus among five DEGs identification methods guarantees a list of DEGs with great accuracy, indicating that the combination of different methods can produce more suitable results. The consensus option is also included for use in the available software.

[1]  Yan Guo,et al.  MultiRankSeq: Multiperspective Approach for RNAseq Differential Expression Analysis and Quality Control , 2014, BioMed research international.

[2]  Robert Tibshirani,et al.  Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data , 2013, Statistical methods in medical research.

[3]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[4]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[5]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[6]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[7]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[8]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[9]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[10]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[11]  I. Nookaew,et al.  A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae , 2012, Nucleic acids research.

[12]  Antonio Reverter,et al.  The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome , 2010, BMC Genomics.

[13]  Piero Carninci,et al.  The devil in the details of RNA-seq , 2014, Nature Biotechnology.

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[16]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[17]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[18]  G. Barton,et al.  How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? , 2015, RNA.

[19]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[20]  Denis C. Bauer,et al.  A Comparative Study of Techniques for Differential Expression Analysis on RNA-Seq Data , 2014, bioRxiv.

[21]  Luis Rueda,et al.  Zseq: An Approach for Preprocessing Next-Generation Sequencing Data , 2017, J. Comput. Biol..

[22]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[23]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[24]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[25]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[26]  Fabricio M. Lopes,et al.  Assessing the gain of biological data integration in gene networks inference , 2012, BMC Genomics.

[27]  Wing Hung Wong,et al.  Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis , 2017, Nature Communications.

[28]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[29]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[30]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[31]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[32]  Guy N. Brock,et al.  Power analysis for RNA-Seq differential expression studies , 2017, BMC Bioinformatics.

[33]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[34]  Catalin C. Barbacioru,et al.  Evaluation of DNA microarray results with quantitative gene expression platforms , 2006, Nature Biotechnology.

[35]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[36]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[37]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[38]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[39]  M. Gerstein,et al.  Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays , 2010, BMC Genomics.

[40]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[41]  Ming Yi,et al.  bioDBnet: the biological database network , 2009, Bioinform..

[42]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[43]  Giuseppe Testa,et al.  RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods , 2016, Nucleic acids research.

[44]  A. Conesa,et al.  Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package , 2015, Nucleic acids research.

[45]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[46]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[47]  Laura L. Elo,et al.  Comparison of software packages for detecting differential expression in RNA-seq studies , 2013, Briefings Bioinform..

[48]  Keun Ho Ryu,et al.  Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data , 2015, BMC Bioinformatics.

[49]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[50]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences , 2015, F1000Research.

[51]  C. Goose,et al.  Glossary of Terms , 2004, Machine Learning.

[52]  M. Robinson,et al.  Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. , 2015, F1000Research.