Inferring disease and gene set associations with rank coherence in networks

MOTIVATION To validate the candidate disease genes identified from high-throughput genomic studies, a necessary step is to elucidate the associations between the set of candidate genes and disease phenotypes. The conventional gene set enrichment analysis often fails to reveal associations between disease phenotypes and the gene sets with a short list of poorly annotated genes, because the existing annotations of disease-causative genes are incomplete. This article introduces a network-based computational approach called rcNet to discover the associations between gene sets and disease phenotypes. A learning framework is proposed to maximize the coherence between the predicted phenotype-gene set relations and the known disease phenotype-gene associations. An efficient algorithm coupling ridge regression with label propagation and two variants are designed to find the optimal solution to the objective functions of the learning framework. RESULTS We evaluated the rcNet algorithms with leave-one-out cross-validation on Online Mendelian Inheritance in Man (OMIM) data and an independent test set of recently discovered disease-gene associations. In the experiments, the rcNet algorithms achieved best overall rankings compared with the baselines. To further validate the reproducibility of the performance, we applied the algorithms to identify the target diseases of novel candidate disease genes obtained from recent studies of Genome-Wide Association Study (GWAS), DNA copy number variation analysis and gene expression profiling. The algorithms ranked the target disease of the candidate genes at the top of the rank list in many cases across all the three case studies. AVAILABILITY http://compbio.cs.umn.edu/dgsa_rcNet CONTACT kuang@cs.umn.edu.

[1]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[2]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[3]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[4]  Michael Q. Zhang,et al.  Network-based global inference of human disease genes , 2008, Molecular systems biology.

[5]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[6]  A. Shlien,et al.  Copy number variations and cancer , 2009, Genome Medicine.

[7]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .

[8]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[9]  E. Lander,et al.  Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma , 2007, Proceedings of the National Academy of Sciences.

[10]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[11]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[13]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[14]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[15]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[16]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[17]  R. Bernards,et al.  Enabling personalized cancer medicine through analysis of gene-expression patterns , 2008, Nature.

[18]  Petter Holme,et al.  Network Properties of Complex Human Disease Genes Identified through Genome-Wide Association Studies , 2009, PloS one.

[19]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[20]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[21]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[22]  TaeHyun Hwang,et al.  A Heterogeneous Label Propagation Algorithm for Disease Gene Discovery , 2010, SDM.

[23]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[24]  Sergio E Baranzini,et al.  The genetics of autoimmune diseases: a networked perspective. , 2009, Current opinion in immunology.

[25]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[26]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[27]  E. Snitkin,et al.  Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network , 2009, Genome Biology.

[28]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[29]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[30]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.