ncRDeep: Non-coding RNA classification with convolutional neural network

A non-coding RNA (ncRNA) is a kind of RNA that is not converted into protein, however, it is involved in many biological processes, diseases, and cancers. Numerous ncRNAs have been identified and classified with high throughput sequencing technology. Hence, accurate ncRNAs class prediction is important and necessary for further study of their functions. Several computation techniques have been employed to predict the class of ncRNAs. Recent classification methods used the secondary structure as their primary input. However, the computational tools of RNA secondary structure are not accurate enough which affects the final performance of ncRNAs predictors. In this paper, we propose a simple yet efficient method, called ncRDeep, for ncRNAs prediction. It uses a simple convolutional neural network and RNA sequence information only. The ncRDeep was evaluated on benchmark datasets and the comparison results showed that the ncRDeep outperforms the state-of-the-art methods significantly. More specifically, the average accuracy was improved by 8.32%. Finally, we built a freely accessible web server for the developed tool ncRDeep at http://home.jbnu.ac.kr/NSCL/ncRDeep.htm.

[1]  Hassan Ghasemi,et al.  Circular RNAs in β-cell function and type 2 diabetes-related complications: a potential diagnostic and therapeutic approach , 2019, Molecular Biology Reports.

[2]  Alexander F. Palazzo,et al.  Non-coding RNA: what is functional and what is junk? , 2015, Front. Genet..

[3]  Abdollah Homaifar,et al.  Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network , 2017, BioData Mining.

[4]  T. Steitz,et al.  The structural basis of ribosome activity in peptide bond synthesis. , 2000, Science.

[5]  Kil To Chong,et al.  Identification of promoters and their strength using deep learning , 2019 .

[6]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[7]  R. Terns,et al.  Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs , 2007, Nature Reviews Molecular Cell Biology.

[8]  Giulia Fiscon,et al.  A Perspective on the Algorithms Predicting and Evaluating the RNA Secondary Structure , 2016 .

[9]  Kil To Chong,et al.  4mCCNN: Identification of N4-Methylcytosine Sites in Prokaryotes Using Convolutional Neural Network , 2019, IEEE Access.

[10]  Lin He,et al.  MicroRNAs: small RNAs with a big role in gene regulation , 2004, Nature reviews genetics.

[11]  Phillip D Zamore,et al.  microPrimer: the biogenesis and function of microRNA , 2005, Development.

[12]  Kil To Chong,et al.  Convolutional neural networks for discrimination of RNA pseudouridine sites , 2019 .

[13]  Syed Danish Ali,et al.  iIM-CNN: Intelligent Identifier of 6mA Sites on Different Species by Using Convolution Neural Network , 2019, IEEE Access.

[14]  N. Rajewsky,et al.  The evolution of gene regulation by transcription factors and microRNAs , 2007, Nature Reviews Genetics.

[15]  Kil To Chong,et al.  iMethyl-Deep: N6 Methyladenosine Identification of Yeast Genome with Automatic Feature Extraction Technique by Using Deep Learning Algorithm , 2020, Genes.

[16]  Howard Y. Chang,et al.  Long noncoding RNA HOTAIR reprograms chromatin state to promote cancer metastasis , 2010, Nature.

[17]  Gustavo Isaza,et al.  A systematic review of the application of machine learning in the detection and classification of transposable elements , 2019, PeerJ.

[18]  Xiaodan Zhong,et al.  ncRFP: A Novel end-to-end Method for Non-Coding RNAs Family Prediction Based on Deep Learning , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[20]  Sibum Sung,et al.  Vernalization-Mediated Epigenetic Silencing by a Long Intronic Noncoding RNA , 2011, Science.

[21]  Kil To Chong,et al.  iSS-CNN: Identifying splicing sites using convolution neural network , 2019, Chemometrics and Intelligent Laboratory Systems.

[22]  Syed Danish Ali,et al.  A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation , 2020, IEEE Access.

[23]  Zhibin Lv,et al.  Protein Function Prediction: From Traditional Classifier to Deep Learning , 2019, Proteomics.

[24]  Kil To Chong,et al.  Deep Splicing Code: Classifying Alternative Splicing Events Using Deep Learning , 2019, Genes.

[25]  Xiangxiang Zeng,et al.  Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods , 2020, Briefings Bioinform..

[26]  Tamás Kiss,et al.  Targeting vertebrate intron-encoded box C/D 2′-O-methylation guide RNAs into the Cajal body , 2014, Nucleic acids research.

[27]  Xinyi Liu,et al.  Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. , 2019, Methods.

[28]  Q. Zou,et al.  Deep learning in omics: a survey and guideline , 2018, Briefings in functional genomics.

[29]  Jijun Tang,et al.  Prediction of human protein subcellular localization using deep learning , 2017, J. Parallel Distributed Comput..

[30]  Kil To Chong,et al.  Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning , 2019, Cells.

[31]  Antonino Fiannaca,et al.  nRC: non-coding RNA Classifier based on structural features , 2017, BioData Mining.

[32]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[33]  P. Jagodziński,et al.  The Long Non-Coding RNA Landscape of Atherosclerotic Plaques , 2019, Molecular Diagnosis & Therapy.

[34]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Hilal Tayara,et al.  Improved Predicting of The Sequence Specificities of RNA Binding Proteins by Deep Learning , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  F. Hubé,et al.  Coding and Non-coding RNAs, the Frontier Has Never Been So Blurred , 2018, Front. Genet..

[37]  J. Mattick Non‐coding RNAs: the architects of eukaryotic complexity , 2001, EMBO reports.

[38]  Gajendra PS Raghava,et al.  Prediction and classification of ncRNAs using structural information , 2014, BMC Genomics.

[39]  Kil To Chong,et al.  iN6-Methyl (5-step): Identifying RNA N6-methyladenosine sites using deep learning mode via Chou's 5-step rules and Chou's general PseKNC , 2019, Chemometrics and Intelligent Laboratory Systems.

[40]  Xing Gao,et al.  Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites , 2019, Neurocomputing.

[41]  Costanza Emanueli,et al.  Transcriptional and Post-transcriptional Gene Regulation by Long Non-coding RNA , 2017, Genom. Proteom. Bioinform..

[42]  Hilal Tayara,et al.  iPseU-CNN: Identifying RNA Pseudouridine Sites Using Convolutional Neural Networks , 2019, Molecular therapy. Nucleic acids.

[43]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[44]  Melissa J. Fullwood,et al.  Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer , 2016, Genom. Proteom. Bioinform..

[45]  Dirk Walther,et al.  Identification and classification of ncRNA molecules using graph properties , 2009, Nucleic acids research.

[46]  Kil To Chong,et al.  DeePromoter: Robust Promoter Predictor Using Deep Learning , 2019, Front. Genet..