CoRAL: predicting non-coding RNAs from small RNA-sequencing data

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.

[1]  Peter F. Stadler,et al.  DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments , 2011, Nucleic Acids Res..

[2]  J. Brosius,et al.  Primary structure, neural-specific expression, and dendritic location of human BC200 RNA , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[3]  P. Stadler,et al.  The tedious task of finding homologous noncoding RNA genes. , 2009, RNA.

[4]  M. Nalls,et al.  Evidence for natural antisense transcript-mediated inhibition of microRNA function , 2010, Genome Biology.

[5]  M. Fournier,et al.  The small nucleolar RNAs. , 1995, Annual review of biochemistry.

[6]  C. Sander,et al.  A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing , 2007, Cell.

[7]  Ralf Zimmer,et al.  Classification of ncRNAs using position and size information in deep sequencing data , 2010, Bioinform..

[8]  Mark Gerstein,et al.  Bioinformatics Applications Note Gene Expression Rseqtools: a Modular Framework to Analyze Rna-seq Data Using Compact, Anonymized Data Summaries , 2022 .

[9]  J. Rinn,et al.  Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression , 2009, Proceedings of the National Academy of Sciences.

[10]  Piero Carninci,et al.  Multifaceted mammalian transcriptome. , 2008, Current opinion in cell biology.

[11]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[12]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[13]  K. Karbstein,et al.  RNA takes center stage. , 2007, Biopolymers.

[14]  J. Steitz,et al.  U2 as well as U1 small nuclear ribonucleoproteins are involved in premessenger RNA splicing , 1985, Cell.

[15]  Peter F. Stadler,et al.  Identification and Classification of Small RNAs in Transcriptome Sequence Data , 2010, Pacific Symposium on Biocomputing.

[16]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[17]  Weixiong Zhang,et al.  Deep sequencing of small RNAs from human skin reveals major alterations in the psoriasis miRNAome. , 2011, Human molecular genetics.

[18]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[19]  G. Pruijn,et al.  Are the Ro RNP‐associated Y RNAs concealing microRNAs? Y RNA‐derived miRNAs may be involved in autoimmunity , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[20]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[21]  Sebastian D. Mackowiak,et al.  miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades , 2011, Nucleic acids research.

[22]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[23]  Hui Xiao,et al.  NONCODE v3.0: integrative annotation of long noncoding RNAs , 2011, Nucleic Acids Res..

[24]  Li-San Wang,et al.  SAVoR: a server for sequencing annotation and visualization of RNA structures , 2012, Nucleic Acids Res..

[25]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[26]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[27]  Herbert H. Tsang,et al.  Meta-analysis of small RNA-sequencing errors reveals ubiquitous post-transcriptional RNA modifications , 2009, Nucleic acids research.

[28]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[29]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[30]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.