Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level

Background Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented. Results In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method. Conclusions Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04226-0.

[1]  B. Haas,et al.  Advancing RNA-Seq analysis , 2010, Nature Biotechnology.

[2]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[3]  Tao Jiang,et al.  BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences , 2013, Bioinform..

[4]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[5]  Xiao-Guang Chen,et al.  Comparative performance of transcriptome assembly methods for non-model organisms , 2016, BMC Genomics.

[6]  H. Bolouri,et al.  Variability in estimated gene expression among commonly used RNA-seq pipelines , 2020, Scientific Reports.

[7]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[8]  Marshall Nichols,et al.  Comparing reference-based RNA-Seq mapping methods for non-human primate data , 2014, BMC Genomics.

[9]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[10]  Jürgen Jänes,et al.  A comparative study of RNA-seq analysis strategies , 2015, Briefings Bioinform..

[11]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[12]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[13]  Nagarjun Vijay,et al.  Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA‐seq experiments , 2013, Molecular ecology.

[14]  Michael Gribskov,et al.  Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis. , 2016, Bioinformatics.

[15]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[16]  TieLiu Shi,et al.  Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq , 2013, Science China Life Sciences.

[17]  J. Bähler,et al.  Cellular and Molecular Life Sciences REVIEW RNA-seq: from technology to biology , 2022 .

[18]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[19]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[20]  Manja Marz,et al.  De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers , 2019, GigaScience.

[21]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[22]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[23]  Paul Denny,et al.  Genenames.org: the HGNC and VGNC resources in 2019 , 2018, Nucleic Acids Res..

[24]  Leming Shi,et al.  Identification of Tissue-Specific Protein-Coding and Noncoding Transcripts across 14 Human Tissues Using RNA-seq , 2016, Scientific Reports.

[25]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[26]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[27]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[28]  C. Buell,et al.  Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence. , 2013, Natural product reports.

[29]  Satshil Rana,et al.  Comparison of De Novo Transcriptome Assemblers and k-mer Strategies Using the Killifish, Fundulus heteroclitus , 2016, PloS one.

[30]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.