A Novel Integrative Approach for Non-coding RNA Classification Based on Deep Learning

Background: Molecular biomarkers show new ways to understand many disease processes. Noncoding RNAs as biomarkers play a crucial role in several cellular activities, which are highly correlated to many human diseases especially cancer. The classification and the identification of ncRNAs have become a critical issue due to their application, such as biomarkers in many human diseases. Objective: Most existing computational tools for ncRNA classification are mainly used for classifying only one type of ncRNA. They are based on structural information or specific known features. Furthermore, these tools suffer from a lack of significant and validated features. Therefore, the performance of these methods is not always satisfactory. Methods: We propose a novel approach named imCnC for ncRNA classification based on multisource deep learning, which integrates several data sources such as genomic and epigenomic data to identify several ncRNA types. Also, we propose an optimization technique to visualize the extracted features pattern from the multisource CNN model to measure the epigenomics features of each ncRNA type. Results: the computational results using a dataset of 16 human ncRNA classes downloaded from RFAM show that imCnC outperforms the existing tools. Indeed, imCnC achieved an accuracy of 94,18%. In addition, our method enables to discover new ncRNA features using an optimization technique to measure and visualize the features pattern of the imCnC classifier.

[1]  Yong Zhang,et al.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..

[2]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[3]  Oded Rechavi,et al.  H3K9me3 is Required for Transgenerational Inheritance of Small RNAs that Target a Unique Subset of Newly Evolved Genes , 2018 .

[4]  Antonino Fiannaca,et al.  nRC: non-coding RNA Classifier based on structural features , 2017, BioData Mining.

[5]  G. Riggins,et al.  The role of piRNA and its potential clinical implications in cancer , 2015, Epigenomics.

[6]  Gajendra PS Raghava,et al.  Prediction and classification of ncRNAs using structural information , 2014, BMC Genomics.

[7]  Jie Lv,et al.  HHMD: the human histone modification database , 2009, Nucleic Acids Res..

[8]  Eric Rivals,et al.  Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome , 2013, Nucleic acids research.

[9]  David Haussler,et al.  Identification and Classification of Conserved RNA Secondary Structures in the Human Genome , 2006, PLoS Comput. Biol..

[10]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[11]  Kristin Reiche,et al.  Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs , 2014, Genome Biology.

[12]  Jialiang Yang,et al.  A Review on Recent Computational Methods for Predicting Noncoding RNAs , 2017, BioMed research international.

[13]  Wenqiang Yu,et al.  Genome-wide expression of non-coding RNA and global chromatin modification. , 2012, Acta biochimica et biophysica Sinica.

[14]  James E Audia,et al.  Histone Modifications and Cancer. , 2016, Cold Spring Harbor perspectives in biology.

[15]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[16]  D. Delneri,et al.  Non-coding RNAs and disease: the classical ncRNAs make a comeback. , 2016, Biochemical Society transactions.

[17]  K. Adelman,et al.  Non-coding RNA: More uses for genomic junk , 2017, Nature.

[18]  Aimin Li,et al.  PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme , 2014, BMC Bioinformatics.

[19]  M. Bianchi,et al.  Coordinated Actions of MicroRNAs with other Epigenetic Factors Regulate Skeletal Muscle Development and Adaptation , 2017, International journal of molecular sciences.

[20]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[21]  Thomas Thum,et al.  Circulating Noncoding RNAs as Biomarkers of Cardiovascular Disease and Injury. , 2017, Circulation research.

[22]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[23]  Zhipeng Lu,et al.  Vicinal: a method for the determination of ncRNA ends using chimeric reads from RNA-seq experiments , 2014, Nucleic acids research.

[24]  Anouar Boucheham,et al.  IpiRId: Integrative approach for piRNA prediction using genomic and epigenomic data , 2017, PloS one.

[25]  Yi Zhao,et al.  Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts , 2013, Nucleic acids research.

[26]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Kyung-Ah Sohn,et al.  Identification of epigenetic interactions between miRNA and DNA methylation associated with gene expression as potential prognostic markers in bladder cancer , 2017, BMC Medical Genomics.

[28]  Liang Ge,et al.  Multi-source deep learning for information trustworthiness estimation , 2013, KDD.

[29]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[30]  Christian Borgelt,et al.  MoSS: a program for molecular substructure mining , 2005 .

[31]  Wei Wang,et al.  Critical threshold levels of DNA methyltransferase 1 are required to maintain DNA methylation across the genome in human cancer cells. , 2017, Genome research.

[32]  Michael Q. Zhang,et al.  NONCODEV5: a comprehensive annotation database for long non-coding RNAs , 2017, Nucleic Acids Res..

[33]  Yannick Delpu,et al.  Chapter 12 - Noncoding RNAs: Clinical and Therapeutic Applications , 2016 .

[34]  Ashwin Srinivasan,et al.  Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM) , 2010, BMC Bioinformatics.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Rolf Backofen,et al.  BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles , 2014, GCB.

[37]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[38]  Thomas Lengauer,et al.  DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets , 2016, Nucleic Acids Res..

[39]  F. De Majo,et al.  Chromatin remodelling and epigenetic state regulation by non-coding RNAs in the diseased heart , 2018, Non-coding RNA research.

[40]  Xiaogang Wang,et al.  Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Shizuka Uchida,et al.  Noncoder: a web interface for exon array-based detection of long non-coding RNAs , 2012, Nucleic acids research.

[42]  Peter F. Stadler,et al.  RNAz 2.0: Improved Noncoding RNA Detection , 2010, Pacific Symposium on Biocomputing.