Comment on "A comprehensive overview and evaluation of circular RNA detection tools"

A recent paper published in PLOS Computational Biology [1] provided a comprehensive evaluation of various circular RNA (circRNA)-detection tools. The authors compared 11 different circRNA-detection tools using four different datasets, including three simulated datasets (positive, background, and mixed datasets) and one real dataset. Since the advent of highthroughput next-generation sequencing technology, dozens of computational tools have been developed and used to successfully detect thousands of circRNAs in a diverse range of species. However, there are great discrepancies in the results obtained using different tools [2–7], and systematic evaluations of their performance have not been available. Indeed, the cited work has provided a useful guideline for researchers engaged in circRNA studies. Nevertheless, it seems inappropriate to use all CircBase-deposited circRNA candidates (14,689 events) identified in silico from RNA-seq data of HeLa cells [8] as the positive dataset. The qualification of the 14,689 candidates requires further evaluation. We suggest that three main confounding factors, which may affect the fairness of the evaluation of circRNA-detection tools, should be considered. First, it has been shown that non-co-linear (NCL) junctions (including circRNA and transspliced RNA junctions) that do not match annotated exon boundaries tend to be unreliable and are more likely to stem from mis-splicing [9–12], although we cannot eliminate the possibility that a few true backspliced junctions indeed originate from unannotated gene loci. Since circRNA candidates are regarded to be less or more reliable if their normalized read counts are depleted or enriched after RNase R treatment, respectively [13], we reexamined the circRNA candidates detected on the HeLa RNase R-treated and untreated samples (the circRNA candidates and the corresponding read counts were downloaded from the cited study). Of the circRNA candidates with unannotated exon boundaries, we can find that 50%~100% of them were “completely” depleted (not detected) after RNase R treatment, whereas only <8% of them were “significantly” enriched (i.e., 5-fold increase in normalized read count) after RNase R treatment (Fig 1). This result revealed that the candidates with unannotated exon boundaries are more likely to be false calls. Thus, we suggest that the CircBase circRNA candidates with unannotated exon boundaries (1,046 events; Table 1) should be excluded from the positive dataset. At least, since circRNA junctions were observed to be predominantly located at canonical splice sites [14–16], the candidates with junctions that have not canonical splice site sequences (GT-AG, GC-AG, or AT-AC) should be removed (778 events; Table 1). Second, ambiguous alignments originating from repetitive sequences or paralogous genes often result in false positive circRNA detection. In CircBase, most circRNA candidates were identified by find_circ [8]. It has been reported that some of find_circ-identified candidates were mis-predicted from paralogous genes [17]. Therefore, the factor of alignment ambiguity should be considered when using CircBase circRNAs as true positives. To this end, we concatenated the exonic sequence flanking the circRNA junction (within -100 nucleotides

[1]  Sol Shenker,et al.  Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation. , 2014, Cell reports.

[2]  Julia Salzman,et al.  Cell-Type Specific Features of Circular RNA Expression , 2013, PLoS genetics.

[3]  Sanghyuk Lee,et al.  ChimerDB 2.0—a knowledgebase for fusion genes updated , 2009, Nucleic Acids Res..

[4]  Tim Schneider,et al.  Exon circularization requires canonical splice signals. , 2015, Cell reports.

[5]  Enrico Gratton,et al.  In vivo single-cell detection of metabolic oscillations in stem cells. , 2015, Cell reports.

[6]  D. Bartel,et al.  Expanded identification and characterization of mammalian circular RNAs , 2014, Genome Biology.

[7]  C Joel McManus,et al.  Global analysis of trans-splicing in Drosophila , 2010, Proceedings of the National Academy of Sciences.

[8]  Pei Hao,et al.  The evolutionary landscape of intergenic trans-splicing events in insects , 2015, Nature Communications.

[9]  R. Guigó,et al.  Modelling and simulating generic RNA-Seq experiments with the flux simulator , 2012, Nucleic acids research.

[10]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[11]  Petar Glažar,et al.  circBase: a database for circular RNAs , 2014, RNA.

[12]  S. Donatelli,et al.  State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity , 2013, BioMed research international.

[13]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[14]  Linda Szabo,et al.  Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development , 2015, Genome Biology.

[15]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[16]  F. Zhao,et al.  CIRI: an efficient and unbiased algorithm for de novo circular RNA identification , 2015, Genome Biology.

[17]  Michael K. Slevin,et al.  Circular RNAs are abundant, conserved, and associated with ALU repeats. , 2013, RNA.

[18]  Trees-Juen Chuang,et al.  Is an observed non-co-linear RNA product spliced in trans, in cis or just in vitro? , 2014, Nucleic acids research.

[19]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[20]  Trees-Juen Chuang,et al.  Integrative transcriptome sequencing reveals extensive alternative trans-splicing and cis-backsplicing in human cells , 2018, Nucleic acids research.

[21]  Wei Lin,et al.  A comprehensive overview and evaluation of circular RNA detection tools , 2017, PLoS Comput. Biol..

[22]  Jonathan M. Mudge,et al.  Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells , 2012, PloS one.

[23]  S. Donatelli,et al.  State-ofthe-Art Fusion-Finder Algorithms Sensitivity and Specificity , 2013 .

[24]  Thomas D. Wu,et al.  Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples , 2011, BMC Medical Genomics.

[25]  Enrico Macii,et al.  Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model , 2012, Bioinform..

[26]  Xiang Shao,et al.  Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans , 2006, Bioinform..

[27]  David Tollervey,et al.  Apparent Non-Canonical Trans-Splicing Is Generated by Reverse Transcriptase In Vitro , 2010, PloS one.

[28]  Knut Reinert,et al.  A novel and well-defined benchmarking method for second generation read mapping , 2011, BMC Bioinformatics.

[29]  J. Kjems,et al.  Comparison of circular RNA prediction tools , 2015, Nucleic acids research.

[30]  Trees-Juen Chuang,et al.  Integrative transcriptome sequencing identifies trans-splicing events with important roles in human embryonic stem cell pluripotency , 2014, Genome research.

[31]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[32]  Trees-Juen Chuang,et al.  NCLscan: accurate identification of non-co-linear transcripts (fusion, trans-splicing and circular RNA) with a good balance between sensitivity and precision , 2015, Nucleic acids research.

[33]  T. Conrad,et al.  Acfs: accurate circRNA identification and quantification from RNA-Seq data , 2016, Scientific Reports.

[34]  Trees-Juen Chuang,et al.  Biogenesis, identification, and function of exonic circular RNAs , 2015, Wiley interdisciplinary reviews. RNA.

[35]  Mark Wade,et al.  Post-transcriptional exon shuffling events in humans can be evolutionarily conserved and abundant. , 2011, Genome research.

[36]  Marco Beccuti,et al.  State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? , 2013, BMC Bioinformatics.