Candidate Gene Prioritization Using Network Based Probabilistic Models

We present a new gene prioritization method that learns a probabilistic knowledge model from a knowledge base and free text research documents and exploits it to prioritize candidate genes. The knowledge model is represented by a network of associations among the domain entities (e.g., genes) and is extracted from a domain knowledge base (e.g., protein-protein interaction database) or a corpus of text documents (e.g., PubMed documents). This knowledge model is then used to perform probabilistic inferences and applied to the task of candidate gene prioritization. We evaluate our new method on five diseases and show that it outperforms a recently described network based method for candidate gene prioritization.

[1]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  David J. Porteous,et al.  SUSPECTS : enabling fast and effective prioritization of positional candidates , 2005 .

[4]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[5]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[6]  Milos Hauskrecht,et al.  Document Retrieval using a Probabilistic Knowledge Model , 2009, KDIR.

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Milos Hauskrecht,et al.  Improving biomedical document retrieval using domain knowledge , 2008, SIGIR '08.

[9]  Milos Hauskrecht,et al.  Improving Biomedical Document Retrieval by Mining Domain Knowledge , 2009, FLAIRS Conference.

[10]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[11]  Changyu Shen,et al.  Mining Alzheimer Disease Relevant Proteins from Integrated Protein Interactome Data , 2005, Pacific Symposium on Biocomputing.

[12]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[13]  David J. Porteous,et al.  Speeding disease gene discovery by sequence based candidate prioritization , 2005, BMC Bioinformatics.

[14]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.