Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning

Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT.

[1]  D. Haussler,et al.  Cantly Associated with Increased Likelihood of References and Notes Supporting Online Material Materials and Methods Figs. S1 to S4 Tables S1 to S15 References Three Periods of Regulatory Innovation during Vertebrate Evolution , 2022 .

[2]  Chenchen Feng,et al.  TRCirc: a resource for transcriptional regulation information of circRNAs , 2018, Briefings Bioinform..

[3]  Xiaoyong Pan,et al.  PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. , 2015, Molecular bioSystems.

[4]  Jie Wu,et al.  deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data , 2015, Nucleic Acids Res..

[5]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[6]  Petar Glažar,et al.  circBase: a database for circular RNAs , 2014, RNA.

[7]  Christoph Dieterich,et al.  Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. , 2015, Cell reports.

[8]  Mohamed Chaabane,et al.  circDeep: deep learning approach for circular RNA classification from other long non-coding RNA , 2019, Bioinform..

[9]  M. Schmid,et al.  A circRNA from SEPALLATA3 regulates splicing of its cognate mRNA through R-loop formation , 2017, Nature Plants.

[10]  Shanshan Zhu,et al.  Circular intronic long noncoding RNAs. , 2013, Molecular cell.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Gajendra P. S. Raghava,et al.  Prediction of guide strand of microRNAs from its sequence and secondary structure , 2009, BMC Bioinformatics.

[13]  J. Kjems,et al.  Natural RNA circles function as efficient microRNA sponges , 2013, Nature.

[14]  Hui Zhou,et al.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data , 2013, Nucleic Acids Res..

[15]  D. Chellappan,et al.  An Overview of Circular RNAs. , 2018, Advances in experimental medicine and biology.

[16]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[17]  Yang Wang,et al.  Efficient backsplicing produces translatable circular mRNAs , 2015, RNA.

[18]  Xiaofeng Song,et al.  IRESfinder: Identifying RNA internal ribosome entry site in eukaryotic cell using framed k-mer features. , 2018, Journal of genetics and genomics = Yi chuan xue bao.

[19]  Michael K. Slevin,et al.  Circular RNAs are abundant, conserved, and associated with ALU repeats. , 2013, RNA.

[20]  A. Panda Circular RNAs Act as miRNA Sponges. , 2018, Advances in experimental medicine and biology.

[21]  E. Buratti,et al.  Influence of RNA Secondary Structure on the Pre-mRNA Splicing Process , 2004, Molecular and Cellular Biology.

[22]  R. Parker,et al.  Circular RNAs: diversity of form and function , 2014, RNA.

[23]  Neville E. Sanjana,et al.  High-throughput functional genomics using CRISPR–Cas9 , 2015, Nature Reviews Genetics.

[24]  Masaru Tomita,et al.  Characterization of the Splice Sites in Gt-ag and Gc-ag Introns in Higher Eukaryotes Using Full-length Cdnas , 2004, J. Bioinform. Comput. Biol..

[25]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[26]  Xiujuan Lei,et al.  Identifying Cancer-Specific circRNA–RBP Binding Sites Based on Deep Learning , 2019, Molecules.

[27]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[28]  Dongming Liang,et al.  Short intronic repeat sequences facilitate circular RNA production , 2014, Genes & development.

[29]  Hai Zhao,et al.  CircSLNN: Identifying RBP-Binding Sites on circRNAs via Sequence Labeling Neural Networks , 2019, Front. Genet..

[30]  Dawood B. Dudekula,et al.  CircInteractome: A web tool for exploring circular RNAs and their interacting proteins and microRNAs , 2016, RNA biology.

[31]  Lei Wang,et al.  An efficient approach based on multi-sources information to predict circRNA-disease associations using deep convolutional neural network , 2019, Bioinform..

[32]  Y. Zhang,et al.  In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features , 2013, Nature.

[33]  R. Backofen,et al.  GraphProt: modeling binding preferences of RNA-binding proteins , 2014, Genome Biology.

[34]  Dmitri D. Pervouchine,et al.  Circular exonic RNAs: When RNA structure meets topology. , 2019, Biochimica et biophysica acta. Gene regulatory mechanisms.

[35]  William Stafford Noble,et al.  Support vector machine , 2013 .

[36]  Yan Li,et al.  circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations , 2016, Scientific Reports.

[37]  B. Kuehn 1000 Genomes Project promises closer look at variation in human genome. , 2008, JAMA.

[38]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[39]  Jin Billy Li,et al.  RADAR: a rigorously annotated database of A-to-I RNA editing , 2013, Nucleic Acids Res..

[40]  Yang Zhang,et al.  Extensive translation of circular RNAs driven by N6-methyladenosine , 2017, Cell Research.

[41]  Hsien-Da Huang,et al.  CircNet: a database of circular RNAs derived from transcriptome sequencing data , 2015, Nucleic Acids Res..

[42]  C. Ponting,et al.  Catalogues of mammalian long noncoding RNAs: modest conservation and incompleteness , 2009, Genome Biology.

[43]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[44]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[45]  Susanna Manrubia,et al.  Enumerating secondary structures and structural moieties for circular RNAs. , 2016, Journal of theoretical biology.

[46]  D. Kass,et al.  Dynamic gene expression patterns in animal models of early and late heart failure reveal biphasic-bidirectional transcriptional activation of signaling pathways. , 2014, Physiological genomics.

[47]  Julia Salzman,et al.  Circular RNA biogenesis can proceed through an exon-containing lariat precursor , 2015, eLife.

[48]  Robert Giegerich,et al.  RNAshapes: an integrated RNA analysis package based on abstract shapes. , 2006, Bioinformatics.

[49]  R. Backofen,et al.  FLASH: ultra-fast protocol to identify RNA–protein interactions in cells , 2019, Nucleic acids research.

[50]  D. Bartel,et al.  Conserved Function of lincRNAs in Vertebrate Embryonic Development despite Rapid Sequence Evolution , 2011, Cell.

[51]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[52]  Michael Boutros,et al.  The art and design of genetic screens: RNA interference , 2008, Nature Reviews Genetics.

[53]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[54]  Gajendra P. S. Raghava,et al.  PHDcleav: a SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors , 2013, BMC Bioinformatics.

[55]  Xiang Li,et al.  The Biogenesis, Functions, and Challenges of Circular RNAs. , 2018, Molecular cell.

[56]  Xiaoyong Pan,et al.  Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection , 2017, Zeitschrift für Induktive Abstammungs- und Vererbungslehre.

[57]  Jan Gorodkin,et al.  WebCircRNA: Classifying the Circular RNA Potential of Coding and Noncoding RNA , 2018, Genes.

[58]  Junjie Xiao,et al.  Circular RNAs: Promising Biomarkers for Human Diseases , 2018, EBioMedicine.

[59]  Minoru Yoshida,et al.  Rolling Circle Translation of Circular RNA in Living Human Cells , 2015, Scientific Reports.

[60]  Hong-Bin Shen,et al.  CRIP: predicting circRNA–RBP-binding sites using a codon-based encoding and hybrid deep neural networks , 2019, RNA.

[61]  Lennart Martens,et al.  LNCipedia: a database for annotated human lncRNA transcript sequences and structures , 2012, Nucleic Acids Res..

[62]  Honggang Zhou,et al.  Twist1 Regulates Vimentin through Cul2 Circular RNA to Promote EMT in Hepatocellular Carcinoma. , 2018, Cancer research.

[63]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[64]  Li Yang,et al.  CIRCpedia v2: An Updated Database for Comprehensive Circular RNA Annotation and Expression Comparison , 2018, Genom. Proteom. Bioinform..

[65]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[66]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[67]  F. Zhao,et al.  CIRI: an efficient and unbiased algorithm for de novo circular RNA identification , 2015, Genome Biology.

[68]  Xiujuan Lei,et al.  GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion , 2019, International journal of biological sciences.