Inference from genome‐wide association studies using a novel Markov model

In this paper we propose a Bayesian modeling approach to the analysis of genome‐wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k‐means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis‐Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples where single SNP analysis is both successful and unsuccessful at identifying the causal SNP. Genet. Epidemiol. 2008. © 2008 Wiley‐Liss, Inc.

[1]  Andrew P Morris,et al.  A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. , 2006, American journal of human genetics.

[2]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[3]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[4]  Claudio J. Verzilli,et al.  Bayesian graphical models for genomewide association studies. , 2006, American journal of human genetics.

[5]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[6]  David J. Balding,et al.  Multipoint linkage-disequilibrium mapping narrows location interval and identifies mutation heterogeneity , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Frank Dudbridge,et al.  Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. , 2004, American journal of human genetics.

[8]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[9]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[10]  D. Balding,et al.  Fine mapping of disease genes via haplotype clustering , 2006, Genetic epidemiology.

[11]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[12]  Keith R Abrams,et al.  An integrated approach to the meta-analysis of genetic association studies using Mendelian randomization. , 2004, American journal of epidemiology.

[13]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[14]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[15]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[16]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[17]  P. R. Boyd,et al.  Linkage disequilibrium mapping identifies a 390 kb region associated with CYP2D6 poor drug metabolising activity , 2002, The Pharmacogenomics Journal.

[18]  Heikki Mannila,et al.  A Hidden Markov Technique for Haplotype Reconstruction , 2005, WABI.

[19]  Ron Shamir,et al.  A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association , 2005, J. Comput. Biol..

[20]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[21]  Ming-Hui Chen Importance-Weighted Marginal Bayesian Posterior Density Estimation , 1994 .

[22]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[23]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[24]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[25]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.