Learning Representations to Predict Intermolecular Interactions on Large-Scale Heterogeneous Molecular Association Network

Summary Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.

[1]  Hsien-Da Huang,et al.  miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions , 2017, Nucleic Acids Res..

[2]  C. Croce,et al.  MicroRNA gene expression deregulation in human breast cancer. , 2005, Cancer research.

[3]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[4]  Hai-Cheng Yi,et al.  A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein Interactions Using Evolutionary Information , 2018, Molecular therapy. Nucleic acids.

[5]  Yang Li,et al.  HMDD v2.0: a database for experimentally supported human microRNA and disease associations , 2013, Nucleic Acids Res..

[6]  Carolyn J. Brown,et al.  The functional role of long non-coding RNA in human carcinomas , 2011, Molecular Cancer.

[7]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[8]  Zhan Tong,et al.  TransmiR v2.0: an updated transcription factor-microRNA regulation database , 2018, Nucleic Acids Res..

[9]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[10]  Feng Huang,et al.  Predicting drug-disease associations and their therapeutic function based on the drug-disease association bipartite network. , 2018, Methods.

[11]  Dong Wang,et al.  Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases , 2010, Bioinform..

[12]  Russ B. Altman,et al.  PharmGKB: the Pharmacogenetics Knowledge Base , 2002, Nucleic Acids Res..

[13]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2019 , 2018, Nucleic Acids Res..

[14]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[15]  Shuai Li,et al.  Drug-Protein-Disease Association Prediction and Drug Repositioning Based on Tensor Decomposition , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Zhu-Hong You,et al.  Constructing prediction models from expression profiles for large scale lncRNA–miRNA interaction profiling , 2017, Bioinform..

[18]  Sunghoon Kim,et al.  Rational drug repositioning guided by an integrated pharmacological network of protein, disease and drug , 2012, BMC Systems Biology.

[19]  D. Relman,et al.  An ecological and evolutionary perspective on human–microbe mutualism and disease , 2007, Nature.

[20]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[21]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[22]  Wei Tang,et al.  dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers , 2016, Nucleic Acids Res..

[23]  Michael Q. Zhang,et al.  NONCODEV5: a comprehensive annotation database for long non-coding RNAs , 2017, Nucleic Acids Res..

[24]  Xia Li,et al.  SM2miR: a database of the experimentally validated small molecules' effects on microRNA expression , 2013, Bioinform..

[25]  Zhu-Hong You,et al.  Novel link prediction for large-scale miRNA-lncRNA interaction network in a bipartite graph , 2018, BMC Medical Genomics.

[26]  Yan Lu,et al.  Circ2Disease: a manually curated database of experimentally validated circRNAs in human disease , 2018, Scientific Reports.

[27]  Howard Y. Chang,et al.  Gene regulation in the immune system by long noncoding RNAs , 2017, Nature Immunology.

[28]  Tao Jiang,et al.  circRNA disease: a manually curated database of experimentally supported circRNA-disease associations , 2018, Cell Death & Disease.

[29]  Yan Cui,et al.  SomamiR 2.0: a database of cancer somatic mutations altering microRNA–ceRNA interactions , 2015, Nucleic Acids Res..

[30]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[31]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[32]  Xabier Agirre,et al.  Epigenetic regulation of microRNA expression in colorectal cancer , 2009, International journal of cancer.

[33]  B. Honig,et al.  Structure-based prediction of protein-protein interactions on a genome-wide scale , 2012, Nature.

[34]  Ramy K. Aziz,et al.  The PharmacoMicrobiomics Portal: A Database for Drug-Microbiome Interactions , 2012 .

[35]  H. Dweep,et al.  miRWalk2.0: a comprehensive atlas of microRNA-target interactions , 2015, Nature Methods.

[36]  Xing Chen,et al.  LncRNADisease: a database for long-non-coding RNA-associated diseases , 2012, Nucleic Acids Res..

[37]  Keith C. C. Chan,et al.  Large-scale prediction of drug-target interactions from deep representations , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[38]  Xing Chen,et al.  LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities , 2019, PLoS Comput. Biol..

[39]  Petar Glažar,et al.  circBase: a database for circular RNAs , 2014, RNA.

[40]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[41]  B. Guthrie,et al.  Drug-disease and drug-drug interactions: systematic examination of recommendations in 12 UK national clinical guidelines , 2015, BMJ : British Medical Journal.

[42]  Xing Chen,et al.  Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA , 2015, Scientific Reports.

[43]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[44]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins , 2017, IEEE Transactions on Cybernetics.

[45]  J. McCarthy,et al.  Cytoplasmic mRNA-protein interactions in eukaryotic gene expression. , 1995, Trends in biochemical sciences.

[46]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[47]  Xing Chen,et al.  MDAD: A Special Resource for Microbe-Drug Associations , 2018, Front. Cell. Infect. Microbiol..

[48]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[49]  J. Eberwine,et al.  Immunoprecipitation of mRNA-protein complexes , 2006, Nature Protocols.

[50]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  Hongjun Chen,et al.  PlantCircNet: a database for plant circRNA–miRNA–mRNA regulatory networks , 2017, Database J. Biol. Databases Curation.

[53]  Zhen Yang,et al.  LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases , 2018, Nucleic Acids Res..

[54]  Xing Chen,et al.  In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences , 2017, Scientific Reports.

[55]  Qiong Zhang,et al.  lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs , 2017, Nucleic Acids Res..

[56]  Xiujuan Lei,et al.  CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases , 2018, Database J. Biol. Databases Curation.

[57]  Xiangxiang Zeng,et al.  Inferring MicroRNA-Disease Associations by Random Walk on a Heterogeneous Network with Multiple Data Sources , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[58]  Jing Li,et al.  dbDEPC 2.0: updated database of differentially expressed proteins in human cancers , 2011, Nucleic Acids Res..

[59]  Xing Chen,et al.  PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction , 2017, PLoS Comput. Biol..

[60]  Qinghua Guo,et al.  LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse , 2018, Nucleic Acids Res..

[61]  Thomas C. Wiegers,et al.  Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks , 2008, Nucleic Acids Res..

[62]  Ana Kozomara,et al.  miRBase: from microRNA sequences to function , 2018, Nucleic Acids Res..

[63]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[64]  Wei Wu,et al.  NPInter v2.0: an updated database of ncRNA interactions , 2013, Nucleic Acids Res..

[65]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.