PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs

MOTIVATION Different from traditional linear RNAs (containing 5' and 3' ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. RESULTS For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins, we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best AUC of 0.883 across the 37 circRNA datasets when compared to XGBoost, k-Nearest Neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. AVAILABILITY A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Amaresh C Panda,et al.  Identification of HuR target circular RNAs uncovers suppression of PABPN1 translation by CircPABPN1 , 2017, RNA biology.

[2]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[3]  Geoffrey I. Webb,et al.  GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome , 2015, Bioinform..

[4]  Qi Feng,et al.  Transcriptome-wide investigation of circular RNAs in rice , 2015, RNA.

[5]  Geoffrey I. Webb,et al.  GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features , 2016, Scientific Reports.

[6]  Ning Chen,et al.  Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding , 2017, Bioinform..

[7]  Ran Su,et al.  Exploring sequence‐based features for the improved prediction of DNA N4‐methylcytosine sites in multiple species , 2018, Bioinform..

[8]  Weining Yang,et al.  Foxo3 circular RNA retards cell cycle progression via forming ternary complexes with p21 and CDK2 , 2016, Nucleic acids research.

[9]  C. Ghigna,et al.  EMT and stemness: flexible processes tuned by alternative splicing in development and cancer progression , 2017, Molecular Cancer.

[10]  Hong-Bin Shen,et al.  Predicting circRNA-RBP interaction sites using a codon-based encoding and hybrid deep neural networks , 2018, bioRxiv.

[11]  Geoffrey I. Webb,et al.  DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites , 2019, Bioinform..

[12]  Shengjun Wang,et al.  Roles of CircRNAs in Autoimmune Diseases , 2019, Front. Immunol..

[13]  T. Janas,et al.  Mechanisms of RNA loading into exosomes , 2015, FEBS letters.

[14]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[15]  J. Kjems,et al.  Natural RNA circles function as efficient microRNA sponges , 2013, Nature.

[16]  N. Rajewsky,et al.  circRNA biogenesis competes with pre-mRNA splicing. , 2014, Molecular cell.

[17]  Hong-Bin Shen,et al.  CRIP: predicting circRNA–RBP-binding sites using a codon-based encoding and hybrid deep neural networks , 2019, RNA.

[18]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[19]  Dawood B. Dudekula,et al.  CircInteractome: A web tool for exploring circular RNAs and their interacting proteins and microRNAs , 2016, RNA biology.

[20]  Cangzhi Jia,et al.  4mCPred: machine learning methods for DNA N4‐methylcytosine sites prediction , 2018, Bioinform..

[21]  He-da Zhang,et al.  CircRNA: a novel type of biomarker for cancer , 2017, Breast Cancer.

[22]  N. Rajewsky,et al.  Circ-ZNF609 Is a Circular RNA that Can Be Translated and Functions in Myogenesis , 2017, Molecular cell.

[23]  Guodong Chen,et al.  PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization , 2018, Bioinform..

[24]  Faryal Mehwish Awan,et al.  The Circular RNA Interacts with STAT3, Increasing Its Nuclear Translocation and Wound Repair by Modulating Dnmt3a and miR-17 Function. , 2017, Molecular therapy : the journal of the American Society of Gene Therapy.

[25]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[26]  Takashi Yamamura,et al.  Circulating exosomes suppress the induction of regulatory T cells via let-7i in multiple sclerosis , 2018, Nature Communications.

[27]  Andreas W. Schreiber,et al.  The RNA Binding Protein Quaking Regulates Formation of circRNAs , 2015, Cell.

[28]  Wei Chen,et al.  The emerging role of circular RNAs in breast cancer , 2019, Bioscience reports.

[29]  Petar Glažar,et al.  circBase: a database for circular RNAs , 2014, RNA.

[30]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[31]  Kai Wang,et al.  AtCircDB: a tissue‐specific database for Arabidopsis circular RNAs , 2019, Briefings Bioinform..

[32]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[33]  Jun Zhang,et al.  Diverse alternative back-splicing and alternative splicing landscape of circular RNAs , 2016, Genome research.

[34]  Prateek Prasanna,et al.  Radiogenomic analysis of hypoxia pathway is predictive of overall survival in Glioblastoma , 2018, Scientific Reports.

[35]  Michael K. Slevin,et al.  Circular RNAs are abundant, conserved, and associated with ALU repeats. , 2013, RNA.

[36]  Binxu Zhai,et al.  Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. , 2018, The Science of the total environment.

[37]  Charles Gawad,et al.  Circular RNAs Are the Predominant Transcript Isoform from Hundreds of Human Genes in Diverse Cell Types , 2012, PloS one.

[38]  Bin Xu,et al.  A novel protein encoded by a circular RNA circPPP1R12A promotes tumor pathogenesis and metastasis of colon cancer via Hippo-YAP signaling , 2019, Molecular Cancer.

[39]  Wei Chen,et al.  iDNA4mC: identifying DNA N4‐methylcytosine sites based on nucleotide chemical properties , 2017, Bioinform..

[40]  K. Vermaelen,et al.  Vaccine Strategies to Improve Anti-cancer Cellular Immune Responses , 2019, Front. Immunol..

[41]  Si-Yu Xia,et al.  CircView: a visualization and exploration tool for circular RNAs , 2019, Briefings in Bioinformatics.

[42]  Mohamed F. Ghalwash,et al.  Minimum redundancy maximum relevance feature selection approach for temporal gene expression data , 2017, BMC Bioinformatics.

[43]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[44]  Julia Salzman,et al.  Cell-Type Specific Features of Circular RNA Expression , 2013, PLoS genetics.

[45]  Long Chen,et al.  Short-Term Load Forecasting Using EMD-LSTM Neural Networks with a Xgboost Algorithm for Feature Importance Evaluation , 2017 .

[46]  Yi Pan,et al.  A deep learning method for lincRNA detection using auto-encoder algorithm , 2017, BMC Bioinformatics.

[47]  Jiang-xia Zhao,et al.  Circular RNA is enriched and stable in exosomes: a promising biomarker for cancer diagnosis , 2015, Cell Research.

[48]  Hong-Bin Shen,et al.  Predicting RNA‐protein binding sites and motifs through combining local and global deep convolutional neural networks , 2018, Bioinform..

[49]  Martin J. Wainwright,et al.  Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[50]  Jun Deng,et al.  Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network , 2018, Scientific Reports.

[51]  J. Brosius,et al.  A universal approach to investigate circRNA protein coding function , 2019, Scientific Reports.

[52]  Gong Zhang,et al.  A peptide encoded by circular form of LINC-PINT suppresses oncogenic transcriptional elongation in glioblastoma , 2018, Nature Communications.

[53]  L. Deelman,et al.  The (R)-enantiomer of the 6-chromanol derivate SUL-121 improves renal graft perfusion via antagonism of the α1-adrenoceptor , 2019, Scientific Reports.

[54]  Hong-Bin Shen,et al.  Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network , 2018, Neurocomputing.

[55]  Hecheng Zhou,et al.  CircRNA: functions and properties of a novel potential biomarker for cancer , 2017, Molecular Cancer.

[56]  Haimin Li,et al.  Circular RNA: A new star of noncoding RNAs. , 2015, Cancer letters.

[57]  Jernej Ule,et al.  Understanding splicing regulation through RNA splicing maps , 2011, Trends in genetics : TIG.

[58]  Eric L Van Nostrand,et al.  RBP-Maps enables robust generation of splicing regulatory maps , 2018, RNA.