A Robust and Precise ConvNet for Small Non-Coding RNA Classification (RPC-snRC)

Functional or non-coding RNAs are attracting more attention as they are now potentially considered valuable resources in the development of new drugs intended to cure several human diseases. The identification of drugs targeting the regulatory circuits of functional RNAs depends on knowing its family, a task which is known as RNA sequence classification. State-of-the-art small noncoding RNA classification methodologies take secondary structural features as input. However, in such classification, feature extraction approaches only take global characteristics into account and completely oversight co-relative effect of local structures. Furthermore secondary structure based approaches incorporate high dimensional feature space which proves computationally expensive. This paper proposes a novel Robust and Precise ConvNet (RPC-snRC) methodology which classifies small non-coding RNAs sequences into their relevant families by utilizing the primary sequence of RNAs. RPC-snRC methodology learns hierarchical representation of features by utilizing positioning and occurrences information of nucleotides. To avoid exploding and vanishing gradient problems, we use an approach similar to DenseNet in which gradient can flow straight from subsequent layers to previous layers. In order to assess the effectiveness of deeper architectures for small non-coding RNA classification, we also adapted two ResNet architectures having different number of layers. Experimental results on a benchmark small non-coding RNA dataset show that our proposed methodology does not only outperform existing small non-coding RNA classification approaches with a significant performance margin of 10% but it also outshines adapted ResNet architectures.

[1]  Anna Fabijańska,et al.  Viral Genome Deep Classifier , 2019, IEEE Access.

[2]  Hassan Ghasemi,et al.  Circular RNAs in β-cell function and type 2 diabetes-related complications: a potential diagnostic and therapeutic approach , 2019, Molecular Biology Reports.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  R. Weikard,et al.  Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing , 2013, BMC Genomics.

[5]  F. Hubé,et al.  Coding and Non-coding RNAs, the Frontier Has Never Been So Blurred , 2018, Front. Genet..

[6]  F. Slack,et al.  Small non-coding RNAs in animal development , 2008, Nature Reviews Molecular Cell Biology.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[9]  Sebo Withoff,et al.  Genetic variation in the non-coding genome: Involvement of micro-RNAs and long non-coding RNAs in disease. , 2014, Biochimica et biophysica acta.

[10]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[11]  Byunghan Lee,et al.  LncRNAnet: long non‐coding RNA identification using deep learning , 2018, Bioinform..

[12]  Rui Shi,et al.  Facile means for quantifying microRNA expression by real-time PCR. , 2005, BioTechniques.

[13]  Tim R. Mercer,et al.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[14]  Yasubumi Sakakibara,et al.  Convolutional neural networks for classification of alignments of non-coding RNA sequences , 2018, Bioinform..

[15]  Michael A. Barnhart Roles , 2021, The SAGE International Encyclopedia of Music and Culture.

[16]  F. Slack,et al.  let-7 microRNAs in development, stem cells and cancer. , 2008, Trends in molecular medicine.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[19]  S. Salzberg,et al.  The Transcriptional Landscape of the Mammalian Genome , 2005, Science.

[20]  P. Jagodziński,et al.  The Long Non-Coding RNA Landscape of Atherosclerotic Plaques , 2019, Molecular Diagnosis & Therapy.

[21]  Xing Chen,et al.  LncRNADisease: a database for long-non-coding RNA-associated diseases , 2012, Nucleic Acids Res..

[22]  Yanchun Liang,et al.  LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property , 2018, Briefings Bioinform..

[23]  B. Fontoura,et al.  Cytoplasmic p53 polypeptide is associated with ribosomes , 1997, Molecular and cellular biology.

[24]  Yi Zhao,et al.  NONCODE: an integrated knowledge database of non-coding RNAs , 2004, Nucleic Acids Res..

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  Yutaka Saito,et al.  Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures , 2011, BMC Bioinformatics.

[27]  G. Fox,et al.  5S rRNA gene deletions cause an unexpectedly high fitness loss in Escherichia coli. , 1999, Nucleic acids research.

[28]  May D. Wang,et al.  LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning , 2018, Bioinform..

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yi Pan,et al.  A deep learning method for lincRNA detection using auto-encoder algorithm , 2017, BMC Bioinformatics.

[33]  Antonino Fiannaca,et al.  nRC: non-coding RNA Classifier based on structural features , 2017, BioData Mining.

[34]  Yifan Chen,et al.  Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures , 2019, Front. Genet..

[35]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[36]  Sun Mi Park,et al.  MicroRNAs: key players in the immune system, differentiation, tumorigenesis and cell death , 2008, Oncogene.

[37]  Kiyoshi Asai,et al.  Directed acyclic graph kernels for structural RNA analysis , 2008, BMC Bioinformatics.

[38]  Yuan Zhang,et al.  LncRNA-ID: Long non-coding RNA IDentification using balanced random forests , 2015, Bioinform..

[39]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[40]  Pietro Liò,et al.  ncRNA Classification with Graph Convolutional Networks , 2019, ArXiv.

[41]  Seunghyun Park,et al.  Deep Recurrent Neural Network-Based Identification of Precursor microRNAs , 2017, NIPS.

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Shun Xu,et al.  Research progress of circular RNAs in lung cancer , 2018, Cancer biology & therapy.

[44]  Tingting Li,et al.  Identification of long non-protein coding RNAs in chicken skeletal muscle using next generation sequencing. , 2012, Genomics.

[45]  S. Abou Elela,et al.  Role of the 5.8S rRNA in ribosome translocation. , 1997, Nucleic acids research.

[46]  Ge Gao,et al.  CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features , 2017, Nucleic Acids Res..

[47]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[48]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[49]  Tatsuya Akutsu,et al.  DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition , 2012, Bioinform..

[50]  Rolf Backofen,et al.  RNAscClust: clustering RNA sequences using structure conservation and graph based motifs , 2017, Bioinform..

[51]  John S. Mattick,et al.  lncRNAdb: a reference database for long noncoding RNAs , 2010, Nucleic Acids Res..

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Mohamed Chaabane,et al.  End-to-end learning framework for circular RNA classification from other long non-coding RNAs using multi-modal deep learning. , 2018 .

[54]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[55]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[56]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[57]  Noorul Amin,et al.  Evaluation of deep learning in non-coding RNA classification , 2019, Nature Machine Intelligence.

[58]  B. Rost,et al.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines , 2006, PLoS genetics.

[59]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[60]  Yasubumi Sakakibara,et al.  SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing , 2016, Bioinform..

[61]  Melissa J. Fullwood,et al.  Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer , 2016, Genom. Proteom. Bioinform..

[62]  R N Nazar The ribosomal 5.8S RNA: eukaryotic adaptation or processing variant? , 1984, Canadian journal of biochemistry and cell biology = Revue canadienne de biochimie et biologie cellulaire.

[63]  Boonserm Kaewkamnerdpong,et al.  Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm , 2014, Nucleic acids research.

[64]  M. Jovanović,et al.  miRNAs and apoptosis: RNAs to die for , 2006, Oncogene.

[65]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[66]  Jan Gorodkin,et al.  Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix , 2007, PLoS Comput. Biol..

[67]  Boris Lenhard,et al.  RNAdb—a comprehensive mammalian noncoding RNA database , 2004, Nucleic Acids Res..