Identifying disease genes from PPI networks weighted by gene expression under different conditions

The identification of disease genes is an essential issue to decipher the mechanisms of complex diseases. Many existing methods combine machine learning algorithms and network information to predict disease genes and are based on the ‘guilt by association’ assumption, where disease genes are considered to be close to each other in a biomolecular network. Although these methods have gained many novel findings, most of them ignored the edge dynamic changes of biomolecular networks under different conditions when only utilizing the ‘guilt by association’ principle, which will limit their performance. To address this problem, we propose an algorithm that combines the ‘guilt by association’ and the ‘guilt by rewiring’ of biomolecular networks at the same time. The difference of gene co-expression between case and control samples are first processed to obtain the edge dynamic changes (rewiring) of biomolecular networks through weighting the edges of protein-protein interaction (PPI) networks. Then, features are extracted from the weighted PPI network. Finally, a logistic regression is adopted to identify the disease genes. The algorithm achieves AUC values of 0.95, 0.90 and 0.92 on the identification of breast-cancer-related, lung-cancer-related and schizophrenia-related genes, respectively. Two new schizophrenia-related genes are also found from the ranked unknown genes list.

[1]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[2]  Yingyao Zhou,et al.  In Silico Gene Prioritization by Integrating Multiple Data Sources , 2011, PloS one.

[3]  Rui Jiang,et al.  Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach , 2011, BMC Bioinformatics.

[4]  Antonio Reverter,et al.  A Differential Wiring Analysis of Expression Data Correctly Identifies the Gene Containing the Causal Mutation , 2009, PLoS Comput. Biol..

[5]  Fang-Xiang Wu,et al.  Disease gene identification by using graph kernels and Markov random fields , 2014, Science China Life Sciences.

[6]  Fang-Xiang Wu,et al.  Dynamic protein interaction network construction and applications , 2014, Proteomics.

[7]  Hui Yu,et al.  EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles , 2015, Bioinform..

[8]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[9]  V. Soo,et al.  Disease Gene Prioritization , 2011 .

[10]  P. Jia,et al.  SZGR: a comprehensive schizophrenia gene resource , 2009, Molecular Psychiatry.

[11]  Judy H. Cho,et al.  Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. , 2014, Human molecular genetics.

[12]  R. Piro,et al.  Computational approaches to disease‐gene prediction: rationale, classification and successes , 2012, The FEBS journal.

[13]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[14]  Yana Bromberg,et al.  Chapter 15: Disease Gene Prioritization , 2013, PLoS Comput. Biol..

[15]  Yi Pan,et al.  Rechecking the Centrality-Lethality Rule in the Scope of Protein Subcellular Localization Interaction Networks , 2015, PloS one.

[16]  Min Li,et al.  A two-step logistic regression algorithm for identifying individual-cancer-related genes , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[17]  Martin Krzywinski,et al.  Points of Significance: Logistic regression , 2016, Nature Methods.

[18]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[19]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[20]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[21]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[22]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[23]  Fang-Xiang Wu,et al.  Identifying disease genes by integrating multiple data sources , 2014, BMC Medical Genomics.

[24]  Søren Brunak,et al.  MetaRanker 2.0: a web server for prioritization of genetic variation data , 2013, Nucleic Acids Res..

[25]  Takanori Fujita,et al.  PRC2 overexpression and PRC2-target gene repression relating to poorer prognosis in small cell lung cancer , 2013, Scientific Reports.

[26]  Yves Moreau,et al.  Candidate gene prioritization with Endeavour , 2016, Nucleic Acids Res..

[27]  Xing Qiu,et al.  Detecting intergene correlation changes in microarray analysis: a new approach to gene selection , 2009, BMC Bioinformatics.

[28]  Steven R. Head,et al.  Molecular profiles of schizophrenia in the CNS at different stages of illness , 2008, Brain Research.

[29]  Fang-Xiang Wu,et al.  A fast and high performance multiple data integration algorithm for identifying human disease genes , 2015, BMC Medical Genomics.

[30]  Wei Zheng,et al.  dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks , 2011, Bioinform..