In systems biology, it is of great interest to identify new genes that were not previously reported to be associated with biological pathways related to various functions and diseases. Identification of these new pathway-modulating genes does not only promote understanding of pathway regulation mechanisms but also allow identification of novel targets for therapeutics. Recently, biomedical literature has been considered as a valuable resource to investigate pathway-modulating genes. While the majority of currently available approaches are based on the co-occurrence of genes within an abstract, it has been reported that these approaches show only sub-optimal performances because 70% of abstracts contain information only for a single gene. To overcome such limitation, we propose a novel statistical framework based on the concept of ontology fingerprint that uses gene ontology to extract information from large biomedical literature data. The proposed framework simultaneously identifies pathway-modulating genes and facilitates interpreting functions of these new genes. We also propose a computationally efficient posterior inference procedure based on Metropolis-Hastings within Gibbs sampler for parameter updates and the poor man's reversible jump Markov chain Monte Carlo approach for model selection. We evaluate the proposed statistical framework with simulation studies, experimental validation, and an application to studies of pathway-modulating genes in yeast. The R implementation of the proposed model is currently available at https://dongjunchung.github.io/bayesGO/. Copyright © 2017 John Wiley & Sons, Ltd.
[1]
Miguel A. Andrade-Navarro,et al.
Génie: literature-based gene prioritization at multi genomic scale
,
2011,
Nucleic Acids Res..
[2]
Jing Chen,et al.
ToppGene Suite for gene list enrichment analysis and candidate gene prioritization
,
2009,
Nucleic Acids Res..
[3]
Bradley Efron,et al.
Large-scale inference
,
2010
.
[4]
Jijun Tang,et al.
Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network
,
2014,
Nucleic acids research.
[5]
Petros Dellaportas,et al.
On Bayesian model and variable selection using MCMC
,
2002,
Stat. Comput..
[6]
B. Carlin,et al.
Bayesian Model Choice Via Markov Chain Monte Carlo Methods
,
1995
.
[7]
Susumu Goto,et al.
KEGG: Kyoto Encyclopedia of Genes and Genomes
,
2000,
Nucleic Acids Res..
[8]
D. Rubin,et al.
Inference from Iterative Simulation Using Multiple Sequences
,
1992
.
[9]
T. Jenssen,et al.
A literature network of human genes for high-throughput analysis of gene expression
,
2001
.
[10]
Kriston L. McGary,et al.
Open Access Method
,
2007
.
[11]
M. Ashburner,et al.
Gene Ontology: tool for the unification of biology
,
2000,
Nature Genetics.