Constructing an integrated gene similarity network for the identification of disease genes

Discovering novel genes that are involved in human diseases is a challenging task. In recent years, several computational approaches have been proposed to prioritize candidate disease genes. Most of these methods are mainly based on protein-protein interaction (PPI) networks. However, since these PPI networks contain false positives and only cover less half of known human genes, their reliability and coverage are both very low. Therefore, it is highly necessary to fuse multiple genomic data to construct a reliable gene similarity network and then infer disease genes on the whole genomic scale. Here, we proposed a novel method, named RWRB, to infer causal genes of interested disease. First, we construct five individual gene (protein) similarity networks based on multiple genomic data of human genes. Then, an integrated gene similarity network (IGSN) is reconstructed based on similarity network fusion (SNF) method. Finally, we employ the random walk with restart algorithm on the phenotype-gene bilayer network, which combines phenotype similarity network, IGSN as well as the phenotype-gene association network, to prioritize candidate disease genes. We investigate the effectiveness of RWRB through leave-one-out cross-validation methods in inferring phenotype-gene relationships. Results show that RWRB is more accurate than state-of-the-art methods on most evaluation metrics. Further analysis shows that the success of RWRB is benefited from IGSN which has a wider coverage and higher reliability comparing with current PPI networks.

[1]  Rui Jiang,et al.  Pinpointing disease genes through phenomic and genomic data fusion , 2015, BMC Genomics.

[2]  Jinyan Li,et al.  Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data , 2012, BMC Genomics.

[3]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[4]  Min Wu,et al.  A two-layer integration framework for protein complex detection , 2016, BMC Bioinformatics.

[5]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[6]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[7]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[8]  R. Jiang,et al.  Integrating human omics data to prioritize candidate genes , 2013, BMC Medical Genomics.

[9]  Rui Jiang,et al.  Constructing a gene semantic similarity network for the inference of disease genes , 2011, BMC Systems Biology.

[10]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[11]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[12]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[13]  Dayu Xiao,et al.  A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization , 2016, PloS one.

[14]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[15]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[16]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[17]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[18]  Alexandre P. Francisco,et al.  Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores , 2012, PloS one.

[19]  Yadong Wang,et al.  Constructing Networks of Organelle Functional Modules in Arabidopsis , 2016, Current genomics.

[20]  Hailong Zhu,et al.  Integrating multiple networks for protein function prediction , 2015, BMC Systems Biology.

[21]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[22]  Haiyuan Yu,et al.  Network-based methods for human disease gene prediction. , 2011, Briefings in functional genomics.

[23]  Guangyuan Fu,et al.  Predicting Protein Function via Semantic Integration of Multiple Networks , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Matej Oresic,et al.  Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process , 2007, Bioinform..

[25]  Yan-Hua Lai,et al.  Identifying and prioritizing disease-related genes based on the network topological features. , 2014, Biochimica et biophysica acta.

[26]  Phillip W. Lord,et al.  Semantic Similarity in Biomedical Ontologies , 2009, PLoS Comput. Biol..