Heterogeneous networks integration for disease-gene prioritization with node kernels

MOTIVATION The identification of disease-gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational data sets as networks due to the natural and intuitive property of graphs for representing objects' relationships and then utilizing graph based techniques to prioritize genes for successive low throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems. RESULTS We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts, finally we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers.We report state of the art results on 12 disease-gene associations and on a time-stamped benchmark containing 42 newly discovered associations. SUPPLEMENTARY INFORMATION, SOURCE CODE https://github.com/dinhinfotech/DiGI.git.

[1]  R. Altman,et al.  Pharmacogenomics Knowledge for Personalized Medicine , 2012, Clinical pharmacology and therapeutics.

[2]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[3]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[4]  Jean-Philippe Vert,et al.  ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples , 2011, BMC Bioinformatics.

[5]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[6]  Eric P. Xing,et al.  Kernel methods for large-scale genomic data analysis , 2015, Briefings Bioinform..

[7]  Fang-Xiang Wu,et al.  Identifying disease genes by integrating multiple data sources , 2014, BMC Medical Genomics.

[8]  Sunmo Yang,et al.  HumanNet v2: human gene networks for disease research , 2018, Nucleic Acids Res..

[9]  Rolf Backofen,et al.  Freiburg RNA tools: a central online resource for RNA-focused research and teaching , 2018, Nucleic Acids Res..

[10]  Fabio Aiolli,et al.  EasyMKL: a scalable multiple kernel learning algorithm , 2015, Neurocomputing.

[11]  Yingyao Zhou,et al.  In Silico Gene Prioritization by Integrating Multiple Data Sources , 2011, PloS one.

[12]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[13]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[14]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[15]  Alessandro Sperduti,et al.  The conjunctive disjunctive graph node kernel for disease gene prioritization , 2018, Neurocomputing.

[16]  Fang-Xiang Wu,et al.  A fast and high performance multiple data integration algorithm for identifying human disease genes , 2015, BMC Medical Genomics.

[17]  Howard L McLeod,et al.  CANDID: a flexible method for prioritizing candidate genes for complex human traits , 2008, Genetic epidemiology.

[18]  Bart De Moor,et al.  An unbiased evaluation of gene prioritization tools , 2012, Bioinform..

[19]  Xiaoli Li,et al.  Ensemble Positive Unlabeled Learning for Disease Gene Identification , 2014, PloS one.

[20]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[21]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[22]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[23]  Giorgio Valle,et al.  Scuba: scalable kernel-based gene prioritization , 2018, BMC Bioinformatics.

[24]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[25]  Alessandro Sperduti,et al.  The Conjunctive Disjunctive Node Kernel , 2017, ESANN.

[26]  Tetsuro Toyoda,et al.  Prioritising Genes with an Artificial Neural Network Comprising Medical Documents to Accelerate Positional Cloning in Biological Research , 2011 .

[27]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[28]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[29]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[30]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[31]  Chee Keong Kwoh,et al.  Positive-unlabeled learning for disease gene identification , 2012, Bioinform..

[32]  Jana Marie Schwarz,et al.  GeneDistiller—Distilling Candidate Genes from Linkage Intervals , 2008, PloS one.

[33]  Jon W. Huss,et al.  BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources , 2009, Genome Biology.

[34]  Robert Hoehndorf,et al.  Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes , 2018, bioRxiv.

[35]  Kenji Suzuki,et al.  Artificial Neural Networks - Methodological Advances and Biomedical Applications , 2011 .

[36]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[37]  João Pedro de Magalhães,et al.  GeneFriends: a human RNA-seq-based gene and transcript co-expression database , 2014, Nucleic Acids Res..

[38]  Bart De Moor,et al.  Candidate gene prioritization by network analysis of differential expression using machine learning approaches , 2010, BMC Bioinformatics.

[39]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.