circDeep: deep learning approach for circular RNA classification from other long non-coding RNA

Abstract Motivation Over the past two decades, a circular form of RNA (circular RNA), produced through alternative splicing, has become the focus of scientific studies due to its major role as a microRNA (miRNA) activity modulator and its association with various diseases including cancer. Therefore, the detection of circular RNAs is vital to understanding their biogenesis and purpose. Prediction of circular RNA can be achieved in three steps: distinguishing non-coding RNAs from protein coding gene transcripts, separating short and long non-coding RNAs and predicting circular RNAs from other long non-coding RNAs (lncRNAs). However, the available tools are less than 80 percent accurate for distinguishing circular RNAs from other lncRNAs due to difficulty of classification. Therefore, the availability of a more accurate and fast machine learning method for the identification of circular RNAs, which considers the specific features of circular RNA, is essential to the development of systematic annotation. Results Here we present an End-to-End deep learning framework, circDeep, to classify circular RNA from other lncRNA. circDeep fuses an RCM descriptor, ACNN-BLSTM sequence descriptor and a conservation descriptor into high level abstraction descriptors, where the shared representations across different modalities are integrated. The experiments show that circDeep is not only faster than existing tools but also performs at an unprecedented level of accuracy by achieving a 12 percent increase in accuracy over the other tools. Availability and implementation https://github.com/UofLBioinformatics/circDeep. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[2]  Lili Wan,et al.  RNA and Disease , 2009, Cell.

[3]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[4]  G. Shan,et al.  Circular RNAs in Eukaryotic Cells , 2015, Current genomics.

[5]  Robert A Hegele,et al.  0021-972X/06/$15.00/0 The Journal of Clinical Endocrinology & Metabolism 91(7):2689–2695 Printed in U.S.A. Copyright © 2006 by The Endocrine Society doi: 10.1210/jc.2005-2746 A LMNA Splicing Mutation in Two Sisters with Severe Dunnigan-Type Familial Parti , 2022 .

[6]  Xiaoyong Pan,et al.  PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. , 2015, Molecular bioSystems.

[7]  Walter J. Lukiw,et al.  Circular RNA (circRNA) in Alzheimer's disease (AD) , 2013, Front. Genet..

[8]  H. Ostrer,et al.  Inverted repeats are necessary for circularization of the mouse testis Sry transcript. , 1995, Gene.

[9]  N. Rajewsky,et al.  Circ-ZNF609 Is a Circular RNA that Can Be Translated and Functions in Myogenesis , 2017, Molecular cell.

[10]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[11]  Xing Chen,et al.  LncRNADisease: a database for long-non-coding RNA-associated diseases , 2012, Nucleic Acids Res..

[12]  J. Mattick,et al.  Non-coding RNA. , 2006, Human molecular genetics.

[13]  Xiaoyong Pan,et al.  Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection , 2017, Zeitschrift für Induktive Abstammungs- und Vererbungslehre.

[14]  J. Mattick,et al.  Long non-coding RNAs: insights into functions , 2009, Nature Reviews Genetics.

[15]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[16]  Ehsaneddin Asgari,et al.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.

[17]  Christoph Dieterich,et al.  Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. , 2015, Cell reports.

[18]  D. Tatomer,et al.  An Unchartered Journey for Ribosomes: Circumnavigating Circular RNAs to Produce Proteins. , 2017, Molecular cell.

[19]  R. Zeillinger,et al.  Correlation of circular RNA abundance with proliferation – exemplified with colorectal and ovarian cancer, idiopathic lung fibrosis, and normal human tissues , 2015, Scientific Reports.

[20]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[21]  N. Rajewsky,et al.  Translation of CircRNAs , 2017, Molecular cell.

[22]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23]  William R. Jeck,et al.  Expression of Linear and Novel Circular Forms of an INK4/ARF-Associated Non-Coding RNA Correlates with Atherosclerosis Risk , 2010, PLoS genetics.

[24]  R. Parker,et al.  Circular RNAs: diversity of form and function , 2014, RNA.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Laura Scott,et al.  Recurrent de novo point mutations in lamin A cause Hutchinson–Gilford progeria syndrome , 2003, Nature.

[27]  Ling-Ling Chen,et al.  Complementary Sequence-Mediated Exon Circularization , 2014, Cell.

[28]  J. Kjems,et al.  Natural RNA circles function as efficient microRNA sponges , 2013, Nature.

[29]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..