Predicting and classifying short non-coding RNAs using a multiclass evolutionary methodology

High throughput sequencing technologies alongside with advanced bioinformatics methods have uncovered a vast number of short non-coding RNAs. They are split in various categories based on their cellular functionality and their sequential, thermodynamic and structural properties. Existing computational methods emphasize on the prediction of only one type of non-coding RNAs and thus their applicability in full transcriptome studies is limited. Only a few methods so far have been proposed for predicting multiple short non-coding RNAs and they do not cover the most significant classes of them. In the present paper, we introduce a new multiclass method based on the combination of genetic algorithms and support vector machines which is able to distinguish among tRNAs, miRNAs, snoRNAs, rRNAs and other RNA sequences with accuracy over 93%. Finally, the advanced feature selection mechanism of the proposed method uncovers significant characteristics for each one of the studied short non-coding RNAs.

[1]  Spiridon D. Likothanassis,et al.  Predicting human miRNA target genes using a novel computational intelligent framework , 2015, Inf. Sci..

[2]  Youri Hoogstrate,et al.  FlaiMapper: computational annotation of small ncRNA-derived fragments using RNA-seq high-throughput data , 2015, Bioinform..

[3]  B. Strooper,et al.  Non-coding RNAs with essential roles in neurodegenerative disorders , 2012, The Lancet Neurology.

[4]  Michelle S. Scott,et al.  Human box C/D snoRNA processing conservation across multiple cell types , 2011, Nucleic acids research.

[5]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[6]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[7]  Athanasios K. Tsakalidis,et al.  ncRNAclass: A Web Platform for Non-Coding RNA Feature Calculation and MicroRNAs and Targets Prediction , 2015, Int. J. Artif. Intell. Tools.

[8]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[9]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[10]  Peter F. Stadler,et al.  Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data , 2006, ISMB.

[11]  S. Eddy,et al.  A computational screen for methylation guide snoRNAs in yeast. , 1999, Science.

[12]  Weixiong Zhang,et al.  MicroRNA prediction with a novel ranking algorithm based on random walks , 2008, ISMB.

[13]  Byoung-Tak Zhang,et al.  Human microRNA prediction through a probabilistic co-learning model of sequence and structure , 2005, Nucleic acids research.

[14]  Yuk Yee Leung,et al.  Using machine learning and high-throughput RNA sequencing to classify the precursors of small non-coding RNAs. , 2014, Methods.

[15]  Fei Li,et al.  MicroRNA identification based on sequence and structure alignment , 2005, Bioinform..

[16]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[17]  Athanasios K. Tsakalidis,et al.  Where we stand, where we are moving: Surveying computational techniques for identifying miRNA genes and uncovering their regulatory role , 2013, J. Biomed. Informatics.

[18]  A. Pavesi,et al.  Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. , 1994, Nucleic acids research.

[19]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[20]  J. Zomerdijk,et al.  Basic mechanisms in RNA polymerase I transcription of the ribosomal RNA genes. , 2013, Sub-cellular biochemistry.

[21]  Spiridon D. Likothanassis,et al.  YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Liang-Hu Qu,et al.  snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome , 2006, Nucleic acids research.

[23]  Rolf Backofen,et al.  BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles , 2014, GCB.

[24]  Albin Sandelin,et al.  spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data , 2014 .

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  Richard Giegé,et al.  Toward a more complete view of tRNA biology , 2008, Nature Structural &Molecular Biology.

[27]  R. C. Underwood,et al.  Stochastic context-free grammars for tRNA modeling. , 1994, Nucleic acids research.

[28]  P. Stadler,et al.  Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements , 2011, Nucleic acids research.

[29]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[30]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[31]  V. Kim,et al.  Regulation of microRNA biogenesis , 2014, Nature Reviews Molecular Cell Biology.

[32]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[33]  Dean Laslett,et al.  ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. , 2004, Nucleic acids research.

[34]  Wayne A. Decatur,et al.  Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. , 2004, Nucleic acids research.

[35]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[36]  Vincent Moulton,et al.  A Search for H/ACA SnoRNAs in Yeast Using MFE Secondary Structure Prediction , 2003, Bioinform..

[37]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[38]  Peter F. Stadler,et al.  snoStrip: a snoRNA annotation pipeline , 2014, Bioinform..

[39]  Yu Xue,et al.  Prediction of novel pre-microRNAs with high accuracy through boosting and SVM , 2011, Bioinform..

[40]  W. Filipowicz,et al.  The widespread regulation of microRNA biogenesis, function and decay , 2010, Nature Reviews Genetics.

[41]  S. Diederichs,et al.  Gutschner T , Diederichs S . The hallmarks of cancer : a long non-coding RNA point of view . RNA Biol 9 : 703-719 , 2012 .

[42]  A. Adai,et al.  Computational prediction of miRNAs in Arabidopsis thaliana. , 2005, Genome research.

[43]  D. Bartel,et al.  Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. , 2004, Molecular cell.

[44]  C. Francklyn,et al.  Transfer RNA and human disease , 2014, Front. Genet..

[45]  Ashwin Srinivasan,et al.  Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM) , 2010, BMC Bioinformatics.

[46]  M. Mehler,et al.  Emerging roles of non-coding RNAs in brain evolution, development, plasticity and disease , 2012, Nature Reviews Neuroscience.

[47]  S. Möller,et al.  ptRNApred: computational identification and classification of post-transcriptional RNA , 2014, Nucleic acids research.

[48]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[49]  Mingzhi Liao,et al.  Predicting human microRNA precursors based on an optimized feature subset generated by GA-SVM. , 2011, Genomics.

[50]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[51]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[52]  Athanasios K. Tsakalidis,et al.  ncRNA-Class Web Tool: Non-coding RNA Feature Extraction and Pre-miRNA Classification Web Tool , 2012, AIAI.

[53]  Vasile Palade,et al.  microPred: effective classification of pre-miRNAs for human miRNA gene prediction , 2009, Bioinform..

[54]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[55]  Maozu Guo,et al.  Computational Approaches in Detecting Non- Coding RNA , 2013, Current genomics.

[56]  F. Farzaneh,et al.  Are snoRNAs and snoRNA host genes new players in cancer? , 2012, Nature Reviews Cancer.

[57]  Xi Chen,et al.  Computational identification of human long intergenic non-coding RNAs using a GA-SVM algorithm. , 2014, Gene.

[58]  Mihaela Zavolan,et al.  Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing , 2013, Genome Biology.

[59]  J. Mattick,et al.  Non‐coding RNAs: regulators of disease , 2010, The Journal of pathology.

[60]  Athanasios K. Tsakalidis,et al.  EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms , 2014, Bioinform..

[61]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[62]  David Haussler,et al.  RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars , 1994, ISMB.

[63]  Ying Huang,et al.  Bioinformatics Applications Note Identification of Ribosomal Rna Genes in Metagenomic Fragments , 2022 .

[64]  N. Williams,et al.  Eukaryotic 5S rRNA biogenesis , 2011, Wiley interdisciplinary reviews. RNA.