Predicting links between tumor samples and genes using 2-Layered graph based diffusion approach

BackgroundDetermining the association between tumor sample and the gene is demanding because it requires a high cost for conducting genetic experiments. Thus, the discovered association between tumor sample and gene further requires clinical verification and validation. This entire mechanism is time-consuming and expensive. Due to this issue, predicting the association between tumor samples and genes remain a challenge in biomedicine.ResultsHere we present, a computational model based on a heat diffusion algorithm which can predict the association between tumor samples and genes. We proposed a 2-layered graph. In the first layer, we constructed a graph of tumor samples and genes where these two types of nodes are connected by “hasGene” relationship. In the second layer, the gene nodes are connected by “interaction” relationship. We applied the heat diffusion algorithms in nine different variants of genetic interaction networks extracted from STRING and BioGRID database. The heat diffusion algorithm predicted the links between tumor samples and genes with mean AUC-ROC score of 0.84. This score is obtained by using weighted genetic interactions of fusion or co-occurrence channels from the STRING database. For the unweighted genetic interaction from the BioGRID database, the algorithms predict the links with an AUC-ROC score of 0.74.ConclusionsWe demonstrate that the gene-gene interaction scores could improve the predictive power of the heat diffusion model to predict the links between tumor samples and genes. We showed the efficient runtime of the heat diffusion algorithm in various genetic interaction network. We statistically validated our prediction quality of the links between tumor samples and genes.

[1]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[2]  Shihua Zhang,et al.  Tumor characterization and stratification by integrated molecular profiles reveals essential pan-cancer features , 2015, BMC Genomics.

[3]  S. Kummar,et al.  DNA methylation: its role in cancer development and therapy. , 2008, Current problems in cancer.

[4]  Michael R. Lyu,et al.  Mining social networks using heat diffusion processes for marketing candidates selection , 2008, CIKM '08.

[5]  A. Barabasi,et al.  Uncovering disease-disease relationships through the incomplete interactome , 2015, Science.

[6]  J. P. Hou,et al.  DawnRank: discovering personalized driver genes in cancer , 2014, Genome Medicine.

[7]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[8]  Heather K Schopper,et al.  Single thyroid tumour showing multiple differentiated morphological patterns and intramorphological molecular genetic heterogeneity , 2016, Journal of Clinical Pathology.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Steven J. M. Jones,et al.  A collaborative filtering-based approach to biomedical knowledge discovery , 2018, Bioinform..

[11]  John O. Woods,et al.  Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses , 2013, PloS one.

[12]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[13]  Manoj Bhasin,et al.  Prediction of methylated CpGs in DNA sequences using a support vector machine , 2005, FEBS letters.

[14]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[15]  Juan Rosai,et al.  Different Clonal Origin of Bilateral Papillary Thyroid Carcinoma, with a Review of the Literature , 2012, Endocrine Pathology.

[16]  Tao Li,et al.  Recommendation model based on opinion diffusion , 2007, ArXiv.

[17]  Pascal Frossard,et al.  Learning Heat Diffusion Graphs , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[18]  Q. Zou,et al.  Approaches for Recognizing Disease Genes Based on Network , 2014, BioMed research international.

[19]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[20]  Jérôme Kunegis,et al.  Learning spectral graph transformations for link prediction , 2009, ICML '09.

[21]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[22]  Annarita Fiorillo,et al.  Sorcin, a Calcium Binding Protein Involved in the Multidrug Resistance Mechanisms in Cancer Cells , 2014, Molecules.

[23]  Q. Zou,et al.  Similarity computation strategies in the microRNA-disease network: a survey. , 2015, Briefings in functional genomics.

[24]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[25]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[26]  Jie Huang,et al.  Downregulation of HOXA1 gene affects small cell lung cancer cell survival and chemoresistance under the regulation of miR-100. , 2014, European journal of cancer.

[27]  Baldomero Oliva,et al.  Predicting cancer involvement of genes from heterogeneous data , 2008, BMC Bioinformatics.

[28]  L. Hooper,et al.  Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes , 2015, BMC Genomics.

[29]  Vijay K. Devabhaktuni,et al.  Evaluating Link Prediction Accuracy in Dynamic Networks with Added and Removed Edges , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[30]  Michael R. Lyu,et al.  DiffusionRank: a possible penicillin for web spamming , 2007, SIGIR.

[31]  Nitesh V. Chawla,et al.  Multi-relational Link Prediction in Heterogeneous Information Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[32]  Awad H. Al-Mohy,et al.  Computing the Action of the Matrix Exponential, with an Application to Exponential Integrators , 2011, SIAM J. Sci. Comput..

[33]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[34]  Reinhard Schneider,et al.  Using graph theory to analyze biological networks , 2011, BioData Mining.

[35]  Kemp H. Kernstine,et al.  DNA methylation biomarkers for lung cancer , 2011, Tumor Biology.

[36]  Ludovic Denoyer,et al.  Learning social network embeddings for predicting information diffusion , 2014, WSDM.

[37]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[38]  Michael R. Lyu,et al.  Learning latent semantic relations from clickthrough data for query suggestion , 2008, CIKM '08.

[39]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[40]  Xiangxiang Zeng,et al.  Prediction and Validation of Disease Genes Using HeteSim Scores , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  E. Dermitzakis,et al.  Using gene expression to investigate the genetic basis of complex disorders. , 2008, Human molecular genetics.

[42]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[43]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[44]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[45]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[46]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[47]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[48]  Meng Li,et al.  RWCFusion: identifying phenotype-specific cancer driver gene fusions based on fusion pair random walk scoring method , 2016, Oncotarget.

[49]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[50]  Fernando Berzal Galiano,et al.  A Survey of Link Prediction in Complex Networks , 2016, ACM Comput. Surv..

[51]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[52]  Dietrich Rebholz-Schuhmann,et al.  A 2-Layered Graph Based Diffusion Approach for Altmetric Analysis , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[53]  F. Azuaje,et al.  Analysis of a gene co-expression network establishes robust association between Col5a2 and ischemic heart disease , 2013, BMC Medical Genomics.

[54]  Yi-Cheng Zhang,et al.  Heat conduction process on community networks as a recommendation model. , 2007, Physical review letters.

[55]  J. Delvenne,et al.  Random walks on graphs , 2004 .

[56]  Alexandre P. Francisco,et al.  Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores , 2012, PloS one.

[57]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[58]  K. Zhao,et al.  Characterization of genome-wide enhancer-promoter interactions reveals co-expression of interacting genes and modes of higher order chromatin organization , 2012, Cell Research.

[59]  Yu Shyr,et al.  Network-based stratification analysis of 13 major cancer types using mutations in panels of cancer genes , 2015, BMC Genomics.

[60]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[61]  R. Xu,et al.  Phenome-based gene discovery provides information about Parkinson’s disease drug targets , 2016, BMC Genomics.

[62]  Martin Raff,et al.  Finding the Cancer-Critical Genes , 2002 .

[63]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[64]  Ke Hu,et al.  Robustness of Link-prediction Algorithm Based on Similarity and Application to Biological Networks , 2013, ArXiv.

[65]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[66]  W. Coffey,et al.  Diffusion and Reactions in Fractals and Disordered Systems , 2002 .

[67]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[68]  Benjamin J. Raphael,et al.  Network propagation: a universal amplifier of genetic associations , 2017, Nature Reviews Genetics.

[69]  Eli Upfal,et al.  Discovery of Mutated Subnetworks Associated with Clinical Data in Cancer , 2011, Pacific Symposium on Biocomputing.

[70]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[71]  Volker Tresp,et al.  Large-scale factorization of type-constrained multi-relational data , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[72]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[73]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[74]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[75]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[76]  Thomas Mikeska,et al.  DNA Methylation Biomarkers: Cancer and Beyond , 2014, Genes.

[77]  F. Göbel,et al.  Random walks on graphs , 1974 .

[78]  Yifan Jia,et al.  Predicting links based on knowledge dissemination in complex network , 2017 .

[79]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[80]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[81]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[82]  E. Eklund,et al.  The role of HOX genes in malignant myeloid disease , 2007, Current opinion in hematology.

[83]  Eduard Ayguadé,et al.  Limitations and Alternatives for the Evaluation of Large-scale Link Prediction , 2016, ArXiv.

[84]  P. Laird Early detection: The power and the promise of DNA methylation markers , 2003, Nature Reviews Cancer.

[85]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[86]  Roded Sharan,et al.  Network-Based Integration of Disparate Omic Data To Identify "Silent Players" in Cancer , 2015, PLoS Comput. Biol..

[87]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[88]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[89]  Michael P. Schroeder,et al.  A DNA methylation map of human cancer at single base-pair resolution , 2017, Oncogene.

[90]  Futian Wang,et al.  Measuring the robustness of link prediction algorithms under noisy environment , 2016, Scientific Reports.

[91]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[92]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[93]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[94]  Trey Ideker,et al.  Network propagation in the cytoscape cyberinfrastructure , 2017, PLoS Comput. Biol..

[95]  C. Nusbaum,et al.  Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. , 2006, Genome research.

[96]  Feng Luo,et al.  Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory , 2007, BMC Bioinformatics.

[97]  Fan Chung,et al.  The heat kernel as the pagerank of a graph , 2007, Proceedings of the National Academy of Sciences.

[98]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[99]  Hongwei Wu,et al.  CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome , 2013, BMC Medical Genomics.

[100]  Richard Segraves,et al.  Array-based comparative genomic hybridization from formalin-fixed, paraffin-embedded breast tumors. , 2005, The Journal of molecular diagnostics : JMD.

[101]  Nitesh V. Chawla,et al.  Link Prediction: Fair and Effective Evaluation , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[102]  Haitao Wang,et al.  The Nerve Growth Factor Signaling and Its Potential as Therapeutic Target for Glaucoma , 2014, BioMed research international.

[103]  L. Liang,et al.  Mapping complex disease traits with global gene expression , 2009, Nature Reviews Genetics.

[104]  R. Shamir,et al.  Utilizing somatic mutation data from numerous studies for cancer research: proof of concept and applications , 2017, Oncogene.