Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods

Identification of disease-associated circular RNAs (circRNAs) is of critical importance, especially with the dramatic increase in the amount of circRNAs. However, the availability of experimentally validated disease-associated circRNAs is limited, which restricts the development of effective computational methods. To our knowledge, systematic approaches for the prediction of disease-associated circRNAs are still lacking. In this study, we propose the use of deep forests combined with positive-unlabeled learning methods to predict potential disease-related circRNAs. In particular, a heterogeneous biological network involving 17 961 circRNAs, 469 miRNAs, and 248 diseases was constructed, and then 24 meta-path-based topological features were extracted. We applied 5-fold cross-validation on 15 disease data sets to benchmark the proposed approach and other competitive methods and used Recall@k and PRAUC@k to evaluate their performance. In general, our method performed better than the other methods. In addition, the performance of all methods improved with the accumulation of known positive labels. Our results provided a new framework to investigate the associations between circRNA and disease and might improve our understanding of its functions.

[1]  Haimin Li,et al.  Circular RNA: A new star of noncoding RNAs. , 2015, Cancer letters.

[2]  Junjie Xiao,et al.  Circular RNAs: Promising Biomarkers for Human Diseases , 2018, EBioMedicine.

[3]  Wei Li,et al.  The circular RNA Cdr1as, via miR-7 and its targets, regulates insulin transcription and secretion in islet cells , 2015, Scientific Reports.

[4]  Jean-Philippe Vert,et al.  A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[5]  Philip S. Yu,et al.  PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks , 2013, TKDD.

[6]  Julia Salzman,et al.  Cell-Type Specific Features of Circular RNA Expression , 2013, PLoS genetics.

[7]  Yuan Gao,et al.  Circular RNA identification based on multiple seed matching , 2018, Briefings Bioinform..

[8]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  J. Kjems,et al.  Circular RNAs in cancer: opportunities and challenges in the field , 2017, Oncogene.

[11]  Peng Wang,et al.  Link prediction in social networks: the state-of-the-art , 2014, Science China Information Sciences.

[12]  C. Moussa,et al.  Diminished parkin solubility and co-localization with intraneuronal amyloid-β are associated with autophagic defects in Alzheimer's disease. , 2012, Journal of Alzheimer's disease : JAD.

[13]  Yan Li,et al.  circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations , 2016, Scientific Reports.

[14]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[15]  Petar Glažar,et al.  circBase: a database for circular RNAs , 2014, RNA.

[16]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[17]  J. Michael Cherry,et al.  ENCODE data at the ENCODE portal , 2015, Nucleic Acids Res..

[18]  Charles Gawad,et al.  Circular RNAs Are the Predominant Transcript Isoform from Hundreds of Human Genes in Diverse Cell Types , 2012, PloS one.

[19]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[20]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[21]  Michael K. Slevin,et al.  Circular RNAs are abundant, conserved, and associated with ALU repeats. , 2013, RNA.

[22]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[23]  P. Brown,et al.  Circular RNA Is Expressed across the Eukaryotic Tree of Life , 2014, PloS one.

[24]  J. Kjems,et al.  Circular RNA and miR-7 in cancer. , 2013, Cancer research.

[25]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[26]  Peter Goodfellow,et al.  Circular transcripts of the testis-determining gene Sry in adult mouse testis , 1993, Cell.

[27]  William R. Jeck,et al.  Expression of Linear and Novel Circular Forms of an INK4/ARF-Associated Non-Coding RNA Correlates with Atherosclerosis Risk , 2010, PLoS genetics.

[28]  Bin Li,et al.  Circular RNAs in cancer: an emerging key player , 2017, Journal of Hematology & Oncology.

[29]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[30]  Jørgen Kjems,et al.  miRNA‐dependent gene silencing involving Ago2‐mediated cleavage of a circular antisense RNA , 2011, The EMBO journal.

[31]  N. Sharpless,et al.  Detecting and characterizing circular RNAs , 2014, Nature Biotechnology.

[32]  Hui Chen,et al.  A literature survey on smart cities , 2015, Science China Information Sciences.

[33]  Yan Lu,et al.  Circ2Disease: a manually curated database of experimentally validated circRNAs in human disease , 2018, Scientific Reports.

[34]  D. Bartel,et al.  Predicting effective microRNA target sites in mammalian mRNAs , 2015, eLife.

[35]  R. Parker,et al.  Circular RNAs: diversity of form and function , 2014, RNA.

[36]  Yang Gao,et al.  CSCD: a database for cancer-specific circular RNAs , 2017, Nucleic Acids Res..

[37]  A. Barabasi,et al.  Human symptoms–disease network , 2014, Nature Communications.

[38]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[39]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[40]  Ling-Ling Chen The biogenesis and emerging roles of circular RNAs , 2016, Nature Reviews Molecular Cell Biology.

[41]  J. Kjems,et al.  Natural RNA circles function as efficient microRNA sponges , 2013, Nature.

[42]  Jie Wu,et al.  deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data , 2015, Nucleic Acids Res..

[43]  Walter J. Lukiw,et al.  Circular RNA (circRNA) in Alzheimer's disease (AD) , 2013, Front. Genet..

[44]  P. Zaphiropoulos,et al.  Exon skipping and circular RNA formation in transcripts of the human cytochrome P-450 2C18 gene in epidermis and of the rat androgen binding protein gene in testis , 1997, Molecular and cellular biology.

[45]  N. Rajewsky,et al.  circRNA biogenesis competes with pre-mRNA splicing. , 2014, Molecular cell.

[46]  Sebastian D. Mackowiak,et al.  Circular RNAs are a large class of animal RNAs with regulatory potency , 2013, Nature.

[47]  Yang Li,et al.  HMDD v2.0: a database for experimentally supported human microRNA and disease associations , 2013, Nucleic Acids Res..

[48]  Tao Jiang,et al.  circRNA disease: a manually curated database of experimentally supported circRNA-disease associations , 2018, Cell Death & Disease.

[49]  Hsien-Da Huang,et al.  miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions , 2017, Nucleic Acids Res..

[50]  John O. Woods,et al.  Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses , 2013, PloS one.

[51]  Hui Liu,et al.  Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Ling-Ling Chen,et al.  Complementary Sequence-Mediated Exon Circularization , 2014, Cell.