Class similarity network for coding and long non-coding RNA classification

Background Long non-coding RNAs (lncRNAs) play significant roles in varieties of physiological and pathological processes.The premise of the lncRNA functional study is that the lncRNAs are identified correctly. Recently, deep learning method like convolutional neural network (CNN) has been successfully applied to identify the lncRNAs. However, the traditional CNN considers little relationships among samples via an indirect way. Results Inspired by the Siamese Neural Network (SNN), here we propose a novel network named Class Similarity Network in coding RNA and lncRNA classification. Class Similarity Network considers more relationships among input samples in a direct way. It focuses on exploring the potential relationships between input samples and samples from both the same class and the different classes. To achieve this, Class Similarity Network trains the parameters specific to each class to obtain the high-level features and represents the general similarity to each class in a node. The comparison results on the validation dataset under the same conditions illustrate the superiority of our Class Similarity Network to the baseline CNN. Besides, our method performs effectively and achieves state-of-the-art performances on two test datasets. Conclusions We construct Class Similarity Network in coding RNA and lncRNA classification, which is shown to work effectively on two different datasets by achieving accuracy, precision, and F1-score as 98.43%, 0.9247, 0.9374, and 97.54%, 0.9990, 0.9860, respectively.

[1]  Chee Keong Kwoh,et al.  Predicting the interaction biomolecule types for lncRNA: an ensemble deep learning approach , 2020, Briefings Bioinform..

[2]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[3]  Byunghan Lee,et al.  LncRNAnet: long non‐coding RNA identification using deep learning , 2018, Bioinform..

[4]  Xiaoxue Tong,et al.  CPPred: coding potential prediction based on the global description of RNA sequence , 2019, Nucleic acids research.

[5]  W. Zhuo,et al.  Emerging roles of lncRNA in cancer and therapeutic opportunities. , 2019, American journal of cancer research.

[6]  Xiangyin Kong,et al.  Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts , 2010, Cell Research.

[7]  Xuequn Shang,et al.  Deep Learning Enables Accurate Prediction of Interplay Between lncRNA and Disease , 2019, Front. Genet..

[8]  Liangjiang Wang,et al.  Prediction of LncRNA Subcellular Localization with Deep Learning from Sequence Features , 2018, Scientific Reports.

[9]  Samir Brahim Belhaouari,et al.  DeepCNPP: Deep Learning Architecture to Distinguish the Promoter of Human Long Non-Coding RNA Genes and Protein-Coding Genes , 2019, ICIMTH.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  T. Gojobori,et al.  Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes , 2007, Nucleic acids research.

[12]  Dingfeng Li,et al.  Insights into lncRNAs in Alzheimer’s disease mechanisms , 2020, RNA biology.

[13]  Rui Kong,et al.  Dual Convolutional Neural Networks With Attention Mechanisms Based Method for Predicting Disease-Related lncRNA Genes , 2019, Front. Genet..

[14]  J. Kocher,et al.  CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model , 2013, Nucleic acids research.

[15]  Laurent Gil,et al.  Ensembl variation resources , 2018, Database J. Biol. Databases Curation.

[16]  O. A. Volkova,et al.  Interrelations between the Nucleotide Context of Human Start AUG Codon, N-end Amino Acids of the Encoded Protein and Initiation of Translation , 2010, Journal of biomolecular structure & dynamics.

[17]  P. Zhu,et al.  LncRNA AY promotes hepatocellular carcinoma metastasis by stimulating ITGAV transcription , 2019, Theranostics.

[18]  Chee Keong Kwoh,et al.  DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction , 2020, Briefings Bioinform..

[19]  D. Bartel,et al.  lincRNAs: Genomics, Evolution, and Mechanisms , 2013, Cell.

[20]  A. Shiras,et al.  Long Noncoding RNAs: Insight Into Their Roles in Normal and Cancer Stem Cells , 2018 .

[21]  May D. Wang,et al.  LncADeep: an ab initio lncRNA identification and functional annotation tool based on deep learning , 2018, Bioinform..

[22]  M. Carazzolle,et al.  RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences , 2020, NAR genomics and bioinformatics.

[23]  Padideh Danaee,et al.  A deep recurrent neural network discovers complex biological rules to decipher RNA protein-coding potential , 2017, bioRxiv.

[24]  Xuetao Cao,et al.  lncRNA MALAT1 binds chromatin remodeling subunit BRG1 to epigenetically promote inflammation-related hepatocellular carcinoma progression , 2018, Oncoimmunology.

[25]  Yanchun Liang,et al.  LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property , 2018, Briefings Bioinform..

[26]  Yuwei Zhang,et al.  Long noncoding RNA: a crosslink in biological regulatory network , 2018, Briefings Bioinform..

[27]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[28]  Davide Chicco,et al.  Siamese Neural Networks: An Overview , 2021, Artificial Neural Networks, 3rd Edition.

[29]  Pritish Kumar Varadwaj,et al.  DeepLNC, a long non-coding RNA prediction tool using deep neural network , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.

[30]  Yong Zhang,et al.  CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..