Disease association tests by inferring ancestral haplotypes using a hidden markov model

MOTIVATION Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically approximately 10(-7)) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach; however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries. RESULTS We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC. AVAILABILITY The software can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.

[1]  B. Rannala,et al.  High-resolution multipoint linkage-disequilibrium mapping in the context of a human genome sequence. , 2001, American journal of human genetics.

[2]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[3]  Martin S. Taylor,et al.  A High-Resolution Single Nucleotide Polymorphism Genetic Map of the Mouse Genome , 2006, PLoS biology.

[4]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[5]  Vincent Danjean,et al.  On the use of haplotype phylogeny to detect disease susceptibility loci , 2005, BMC Genetics.

[6]  P. Marjoram,et al.  Fine-scale mapping of disease genes with multiple mutations via spatial clustering techniques. , 2003, American journal of human genetics.

[7]  R. Durbin,et al.  Mapping trait loci by use of inferred ancestral recombination graphs. , 2006, American journal of human genetics.

[8]  Carol J. Bult,et al.  The mouse as a model for human biology: a resource guide for complex trait analysis , 2007, Nature Reviews Genetics.

[9]  Jason Cooper,et al.  Use of unphased multilocus genotype data in indirect association studies , 2004, Genetic epidemiology.

[10]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[11]  John M. Winn,et al.  Bayesian association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations , 2007, ISMB/ECCB.

[12]  Ron Shamir,et al.  A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association , 2005, J. Comput. Biol..

[13]  Tao Jiang,et al.  Genetics and population analysis Haplotype-based linkage disequilibrium mapping via direct data mining , 2005 .

[14]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[15]  Chuhsing Kate Hsiao,et al.  Regression-based association analysis with clustered haplotypes through use of genotypes. , 2006, American journal of human genetics.

[16]  Kathryn Roeder,et al.  Evolutionary‐based association analysis using haplotype data , 2003, Genetic epidemiology.

[17]  D J Balding,et al.  Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. , 2002, American journal of human genetics.

[18]  Sebastian Zöllner,et al.  Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci , 2005, Genetics.

[19]  Martin S. Taylor,et al.  Genome-wide genetic association of complex traits in heterogeneous stock mice , 2006, Nature Genetics.

[20]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[21]  D. Schaid Evaluating associations of haplotypes with traits , 2004, Genetic epidemiology.

[22]  K. Broman The Genomes of Recombinant Inbred Lines , 2004, Genetics.

[23]  D. Balding,et al.  Fine mapping of disease genes via haplotype clustering , 2006, Genetic epidemiology.

[24]  Maria De Iorio,et al.  Genetic Association Mapping via Evolution-Based Clustering of Haplotypes , 2007, PLoS genetics.

[25]  Hong-Wen Deng,et al.  Incorporating Single-Locus Tests into Haplotype Cladistic Analysis in Case-Control Studies , 2007, PLoS genetics.