nRC: non-coding RNA Classifier based on structural features

MotivationNon-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks.ResultsWe tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%.ConclusionThe proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.

[1]  Jana Sperschneider,et al.  DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model , 2010, Nucleic acids research.

[2]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[3]  Zhenqiu Liu,et al.  Small nucleolar RNA signatures as biomarkers for non-small-cell lung cancer , 2010, Molecular Cancer.

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  Sridhar Hannenhalli,et al.  PTM-Switchboard—a database of posttranslational modifications of transcription factors, the mediating enzymes and target genes , 2008, Nucleic Acids Res..

[6]  J. Mattick The Genetic Signatures of Noncoding RNAs , 2009, PLoS genetics.

[7]  Giovanni Felici,et al.  CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules , 2015, Bioinform..

[8]  Gajendra PS Raghava,et al.  Prediction and classification of ncRNAs using structural information , 2014, BMC Genomics.

[9]  Antonino Fiannaca,et al.  A Deep Learning Approach to DNA Sequence Classification , 2015, CIBB.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  Pierre Baldi,et al.  Deep autoencoder neural networks for gene ontology annotation predictions , 2014, BCB.

[12]  Giulio Iannello,et al.  MONSTER v1.1: a tool to extract and search for RNA non-branching structures , 2015, BMC Genomics.

[13]  R. Burnap Systems and Photosystems: Cellular Limits of Autotrophic Productivity in Cyanobacteria , 2014, Front. Bioeng. Biotechnol..

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Varun Kulkarni,et al.  MiRNA-Target Interaction Reveals Cell-Specific Post-Transcriptional Regulation in Mammalian Cell Lines , 2016, International journal of molecular sciences.

[17]  Alfredo Ferro,et al.  Computational Approaches for the Analysis of ncRNA through Deep Sequencing Techniques , 2015, Front. Bioeng. Biotechnol..

[18]  J. Stenvang,et al.  Silencing of microRNA families by seed-targeting tiny LNAs , 2011, Nature Genetics.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[21]  Giovanni Felici,et al.  MALA: A Microarray Clustering and Classification Software , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[22]  Yong Zhang,et al.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..

[23]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[24]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[25]  Giuseppe Di Fatta,et al.  The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration , 2014, Journal of Cheminformatics.

[26]  Robert Valentine,et al.  Epstein-Barr virus-encoded EBNA1 inhibits the canonical NF-κB pathway in carcinoma cells by inhibiting IKK phosphorylation , 2010, Molecular Cancer.

[27]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[28]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[29]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[30]  Howard Y. Chang,et al.  Unique features of long non-coding RNA biogenesis and function , 2015, Nature Reviews Genetics.

[31]  Giulia Fiscon,et al.  A Perspective on the Algorithms Predicting and Evaluating the RNA Secondary Structure , 2016 .

[32]  S. Shenouda,et al.  MicroRNA function in cancer: oncogene or a tumor suppressor? , 2009, Cancer and Metastasis Reviews.

[33]  Hosna Jabbari,et al.  A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures , 2014, BMC Bioinformatics.

[34]  X. Sun,et al.  Long non-coding RNA HOTAIR regulates cyclin J via inhibition of microRNA-205 expression in bladder cancer , 2015, Cell Death and Disease.

[35]  A. Hinnebusch,et al.  Regulation of Translation Initiation in Eukaryotes: Mechanisms and Biological Targets , 2009, Cell.

[36]  Tamás Kiss,et al.  Cajal body‐specific small nuclear RNAs: a novel class of 2′‐O‐methylation and pseudouridylation guide RNAs , 2002, The EMBO journal.

[37]  Yanni Sun,et al.  RNA-CODE: A Noncoding RNA Classification Tool for Short Reads in NGS Data Lacking Reference Genomes , 2013, PloS one.

[38]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[39]  A. Lal,et al.  MicroRNAs and their target gene networks in breast cancer , 2010, Breast Cancer Research.

[40]  R. Breaker Riboswitches and the RNA world. , 2012, Cold Spring Harbor perspectives in biology.

[41]  M. Berthold,et al.  Context-Aware Visual Exploration of Molecular Databases , 2006 .

[42]  Christian Borgelt,et al.  MoSS: a program for molecular substructure mining , 2005 .

[43]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[44]  Taeho Hwang,et al.  DynaMod: dynamic functional modularity analysis , 2010, Nucleic Acids Res..

[45]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[46]  R. Parker,et al.  Circular RNAs: diversity of form and function , 2014, RNA.

[47]  Tatsuya Akutsu,et al.  IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming , 2011, Bioinform..

[48]  N. Rajewsky,et al.  A human snoRNA with microRNA-like functions. , 2008, Molecular cell.

[49]  Wan L. Lam,et al.  Piwi-interacting RNAs in cancer: emerging functions and clinical utility , 2016, Molecular Cancer.

[50]  Boonserm Kaewkamnerdpong,et al.  Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm , 2014, Nucleic acids research.

[51]  R. Terns,et al.  Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs , 2007, Nature Reviews Molecular Cell Biology.

[52]  C. Croce Causes and consequences of microRNA dysregulation in cancer , 2009, Nature Reviews Genetics.

[53]  Ashesh A. Saraiya,et al.  snoRNA, a Novel Precursor of microRNA in Giardia lamblia , 2008, PLoS pathogens.

[54]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[55]  Phillip D Zamore,et al.  microPrimer: the biogenesis and function of microRNA , 2005, Development.

[56]  H WittenIan,et al.  The WEKA data mining software , 2009 .

[57]  Howard Y. Chang,et al.  Long noncoding RNA HOTAIR reprograms chromatin state to promote cancer metastasis , 2010, Nature.

[58]  David H Mathews,et al.  Prediction of RNA secondary structure by free energy minimization. , 2006, Current opinion in structural biology.

[59]  Dirk Walther,et al.  Identification and classification of ncRNA molecules using graph properties , 2009, Nucleic acids research.

[60]  Wei Zhou,et al.  Implication of snoRNA U50 in human breast cancer. , 2009, Journal of genetics and genomics = Yi chuan xue bao.

[61]  F. Slack,et al.  OncomiR addiction in an in vivo model of microRNA-21-induced pre-B-cell lymphoma , 2010, Nature.

[62]  Giuseppe Di Fatta,et al.  Context-Aware Visual Exploration of Molecular Datab , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).