Guidelines for Setting Up a mRNA Sequencing Experiment and Best Practices for Bioinformatic Data Analysis.

RNA-sequencing, commonly referred to as RNA-seq, is the most recently developed method for the analysis of transcriptomes. It uses high-throughput next-generation sequencing technologies and has revolutionized our understanding of the complexity and dynamics of whole transcriptomes.In this chapter, we recall the key developments in transcriptome analysis and dissect the different steps of the general workflow that can be run by users to design and perform a mRNA-seq experiment as well as to process mRNA-seq data obtained by the Illumina technology. The chapter proposes guidelines for completing a mRNA-seq study properly and makes available recommendations for best practices based on recent literature and on the latest developments in technology and algorithms. We also remark the large number of choices available (especially for bioinformatic data analysis) in front of which the scientist may be in trouble.In the last part of the chapter we discuss the new frontiers of single-cell RNA-seq and isoform sequencing by long read technology.

[1]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[2]  E. Liu,et al.  5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jian-Kang Zhu,et al.  Rapid phosphatidic acid accumulation in response to low temperature stress in Arabidopsis is generated through diacylglycerol kinase , 2013, Front. Plant Sci..

[4]  The Uniprot Consortium UniProt: the universal protein knowledgebase , 2018, Nucleic acids research.

[5]  Matthew D. Young,et al.  Gene ontology analysis for RNA-seq: accounting for selection bias , 2010, Genome Biology.

[6]  Marshall Nichols,et al.  Comparing reference-based RNA-Seq mapping methods for non-human primate data , 2014, BMC Genomics.

[7]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  Elena Bushmanova,et al.  rnaQUAST: a quality assessment tool for de novo transcriptome assemblies , 2016, Bioinform..

[10]  E. Shapiro,et al.  Single-cell sequencing-based technologies will revolutionize whole-organism science , 2013, Nature Reviews Genetics.

[11]  Peter Winter,et al.  Gene expression analysis of plant host–pathogen interactions by SuperSAGE , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[13]  Daniel Spies,et al.  Comparative analysis of differential gene expression tools for RNA sequencing time course data , 2017, Briefings Bioinform..

[14]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[15]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[16]  Susanne A. Fritz,et al.  Correlates of Recent Declines of Rodents in Northern and Southern Australia: Habitat Structure Is Critical , 2015, PloS one.

[17]  Luyi Tian,et al.  Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments , 2019, Nature Methods.

[18]  Simon Andrews,et al.  FastQ Screen: A tool for multi-genome mapping and quality control , 2018, F1000Research.

[19]  R. Sathishkumar,et al.  Stress-Induced Accumulation of DcAOX1 and DcAOX2a Transcripts Coincides with Critical Time Point for Structural Biomass Prediction in Carrot Primary Cultures (Daucus carota L.) , 2016, Front. Genet..

[20]  C. Billington,et al.  Orexin activation counteracts decreases in nonexercise activity thermogenesis (NEAT) caused by high-fat diet , 2017, Physiology & Behavior.

[21]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[22]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[23]  S. Fields,et al.  Dynamics of Gene Expression in Single Root Cells of Arabidopsis thaliana. , 2019, The Plant cell.

[24]  Kenneth D. Birnbaum,et al.  The potential of single-cell profiling in plants , 2016, Genome Biology.

[25]  Marcel H. Schulz,et al.  Informed kmer selection for de novo transcriptome assembly , 2015, Bioinform..

[26]  S. Cockell Gene Set Enrichment Analysis , 2011 .

[27]  John Quackenbush,et al.  WebMeV: a Cloud Platform for Analyzing and Visualizing Cancer Genomic Data , 2017, bioRxiv.

[28]  A. Conesa,et al.  Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package , 2015, Nucleic acids research.

[29]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[30]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[31]  Neva C. Durand,et al.  Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus) , 2017, BMC Biology.

[32]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[33]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[34]  Anna Y. Tang,et al.  Biological significance of RNA-seq and single-cell genomic research in woody plants , 2019, Journal of Forestry Research.

[35]  Gilles Celeux,et al.  Data-based filtering for replicated high-throughput transcriptome sequencing experiments , 2013, Bioinform..

[36]  Z. Fei,et al.  Catalyzing plant science research with RNA-seq , 2013, Front. Plant Sci..

[37]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[38]  Patrick J. Biggs,et al.  SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data , 2010, BMC Bioinformatics.

[39]  Daniel Nilsson,et al.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge , 2014, Genome Biology.

[40]  Manja Marz,et al.  De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers , 2019, GigaScience.

[41]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[42]  Marcel E Dinger,et al.  Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data , 2017, Scientific Reports.

[43]  Nuno A. Fonseca,et al.  Tools for mapping high-throughput sequencing data , 2012, Bioinform..

[44]  Emily M. Strait,et al.  The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome , 2015, Genesis.

[45]  Caroline C. Friedel,et al.  A Comprehensive Evaluation of Alignment Algorithms in the Context of RNA-Seq , 2012, PloS one.

[46]  Suzanna E Lewis,et al.  JBrowse: a dynamic web platform for genome visualization and analysis , 2016, Genome Biology.

[47]  Tyson A. Clark,et al.  Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing , 2016, Nature Communications.

[48]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[49]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[50]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[51]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.

[52]  Gene expression profiling of tomato roots interacting with Pseudomonas fluorescens unravels the molecular reprogramming that occurs during the early phases of colonization , 2019, Symbiosis.

[53]  Michele Tumminello,et al.  RIP-Chip analysis supports different roles for AGO2 and GW182 proteins in recruiting and processing microRNA targets , 2019, BMC Bioinformatics.

[54]  Qian Li,et al.  Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing , 2018, PloS one.

[55]  John Quackenbush,et al.  The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes , 2004, Nucleic Acids Res..

[56]  Daniel Soudry,et al.  Bifurcation analysis of two coupled Jansen-Rit neural mass models , 2018, PloS one.

[57]  N. D’Agostino,et al.  SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes , 2009, BMC Plant Biology.

[58]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[59]  Xiandong Meng,et al.  Widespread Polycistronic Transcripts in Fungi Revealed by Single-Molecule mRNA Sequencing , 2015, PloS one.

[60]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[61]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[62]  Thomas Ragg,et al.  The RIN: an RNA integrity number for assigning integrity values to RNA measurements , 2006, BMC Molecular Biology.

[63]  S. Banerjee,et al.  Targeted Next Generation Sequencing Revealed a Novel Homozygous Loss-of-Function Mutation in ILDR1 Gene Causes Autosomal Recessive Nonsyndromic Sensorineural Hearing Loss in a Chinese Family , 2019, Front. Genet..

[64]  Holger Heyn,et al.  Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies , 2018, Nature Protocols.

[65]  P. Walsh,et al.  Simultaneous Amplification and Detection of Specific DNA Sequences , 1992, Bio/Technology.

[66]  M. Hemberg,et al.  Challenges in unsupervised clustering of single-cell RNA-seq data , 2019, Nature Reviews Genetics.

[67]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[68]  Yuki Moriya,et al.  KAAS: an automatic genome annotation and pathway reconstruction server , 2007, Nucleic Acids Res..

[69]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[70]  Yvan Saeys,et al.  A comparison of single-cell trajectory inference methods , 2019, Nature Biotechnology.

[71]  Claudio Lottaz,et al.  FastqPuri: high-performance preprocessing of RNA-seq data , 2018, BMC Bioinformatics.

[72]  B. Elmoualij,et al.  A decade of improvements in quantification of gene expression and internal standard selection. , 2009, Biotechnology advances.

[73]  Nneka Emenyonu,et al.  Rethinking the “Pre” in Pre-Therapy Counseling: No Benefit of Additional Visits Prior to Therapy on Adherence or Viremia in Ugandans Initiating ARVs , 2012, PloS one.

[74]  S A Bustin,et al.  Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. , 2002, Journal of molecular endocrinology.

[75]  S. Ott,et al.  Single-Cell Transcriptomics: A High-Resolution Avenue for Plant Functional Genomics. , 2019, Trends in plant science.

[76]  N. El-Mabrouk,et al.  Gene order alignment on trees with multiOrthoAlign , 2014, BMC Genomics.

[77]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[78]  H. Duncan,et al.  Histone Acetylation as a Regenerative Target in the Dentine-Pulp Complex , 2020, Frontiers in Genetics.

[79]  Geng Chen,et al.  Single-Cell RNA-Seq Technologies and Related Computational Data Analysis , 2019, Front. Genet..

[80]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[81]  A. Valsamakis,et al.  Comparison of Automated and Manual Nucleic Acid Extraction Methods for Detection of Enterovirus RNA , 2003, Journal of Clinical Microbiology.

[82]  B. Tian,et al.  RNA‐Seq methods for transcriptome analysis , 2017, Wiley interdisciplinary reviews. RNA.

[83]  Riccardo Aiese Cigliano,et al.  De Novo Transcriptome Assembly of Cucurbita Pepo L. Leaf Tissue Infested by Aphis Gossypii , 2018, Data.

[84]  David Stephen Horner,et al.  SMRT long reads and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica) , 2018, bioRxiv.

[85]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[86]  Yongsheng Bai,et al.  Evaluation of de novo transcriptome assemblies from RNA-Seq data , 2014, Genome Biology.

[87]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[88]  Travers Ching,et al.  Single-Cell Transcriptomics Bioinformatics and Computational Challenges , 2016, Front. Genet..

[89]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[90]  C. V. Jongeneel,et al.  ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences , 1999, ISMB.

[91]  R. Farrell Isolation of Polyadenylated RNA , 2010 .

[92]  Peter F Stadler,et al.  Chromatin measurements reveal contributions of synthesis and decay to steady-state mRNA levels , 2012 .

[93]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[94]  Nicolas Faivre,et al.  Patterns of cross-contamination in a multispecies population genomic project: detection, quantification, impact, and solutions , 2017, BMC Biology.

[95]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[96]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[97]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[98]  C. Auffray,et al.  The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics. , 1999, Genome research.

[99]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[100]  Yixing Han,et al.  Advanced Applications of RNA Sequencing and Challenges , 2015, Bioinformatics and biology insights.

[101]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[102]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[103]  Adrian Alexa,et al.  Gene set enrichment analysis with topGO , 2006 .

[104]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[105]  Akiyasu C. Yoshizawa,et al.  KAAS: an automatic genome annotation and pathway reconstruction server , 2007, Environmental health perspectives.

[106]  Luigi Frusciante,et al.  TomatEST database: in silico exploitation of EST data to explore expression patterns in tomato species , 2006, Nucleic Acids Res..

[107]  Matthew D. Wilkerson,et al.  PlantGDB: a resource for comparative plant genomics , 2007, Nucleic Acids Res..

[108]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[109]  Zhou Du,et al.  agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update , 2017, Nucleic Acids Res..

[110]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[111]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[112]  J. Lee,et al.  Single-cell RNA sequencing technologies and bioinformatics pipelines , 2018, Experimental & Molecular Medicine.

[113]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[114]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[115]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[116]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[117]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[118]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[119]  Lingling An,et al.  Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey , 2020, Frontiers in Genetics.

[120]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[121]  K. Kinzler,et al.  Serial Analysis of Gene Expression , 1995, Science.

[122]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[123]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[124]  W. Barbazuk,et al.  Genome-wide analyses of alternative splicing in plants: opportunities and challenges. , 2008, Genome research.

[125]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[126]  D. Corey,et al.  RNA sequencing: platform selection, experimental design, and data interpretation. , 2012, Nucleic acid therapeutics.

[127]  Chia-Wei Chen,et al.  OPATs: Omnibus P-value association tests , 2017, Briefings Bioinform..

[128]  C. Pieterse,et al.  RNA-Seq: revelation of the messengers. , 2013, Trends in plant science.

[129]  Sara Ballouz,et al.  Comparison of automated candidate gene prediction systems using genes implicated in type 2 diabetes by genome-wide association studies , 2009, BMC Bioinformatics.

[130]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[131]  C. Peres,et al.  Conservation performance of different conservation governance regimes in the Peruvian Amazon , 2017, Scientific Reports.

[132]  Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire , 2012, BMC Immunology.

[133]  K. Hansen,et al.  Sequencing technology does not eliminate biological variability , 2011, Nature Biotechnology.

[134]  S. Kelly,et al.  TransRate: reference-free quality assessment of de novo transcriptome assemblies , 2015, bioRxiv.

[135]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[136]  Thomas Hackl,et al.  proovread: large-scale high-accuracy PacBio correction through iterative short read consensus , 2014, Bioinform..

[137]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[138]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[139]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[140]  H. Schiöth,et al.  Acute sleep deprivation has no lasting effects on the human antibody titer response following a novel influenza A H1N1 virus vaccination , 2012, BMC Immunology.

[141]  Peter Langfelder,et al.  Fast R Functions for Robust Correlations and Hierarchical Clustering. , 2012, Journal of statistical software.

[142]  J. Parkinson,et al.  Expressed sequence tags: an overview. , 2009, Methods in molecular biology.

[143]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[144]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[145]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[146]  J. Hadfield,et al.  RNA sequencing: the teenage years , 2019, Nature Reviews Genetics.

[147]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[148]  Giuseppe Testa,et al.  RNAontheBENCH: computational and empirical resources for benchmarking RNAseq quantification and differential expression methods , 2016, Nucleic acids research.

[149]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.