Multilocus association mapping using variable-length Markov chains.

I propose a new method for association-based gene mapping that makes powerful use of multilocus data, is computationally efficient, and is straightforward to apply over large genomic regions. The approach is based on the fitting of variable-length Markov chain models, which automatically adapt to the degree of linkage disequilibrium (LD) between markers to create a parsimonious model for the LD structure. Edges of the fitted graph are tested for association with trait status. This approach can be thought of as haplotype testing with sophisticated windowing that accounts for extent of LD to reduce degrees of freedom and number of tests while maximizing information. I present analyses of two published data sets that show that this approach can have better power than single-marker tests or sliding-window haplotypic tests.

[1]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[2]  J. Landers,et al.  A simple, bead-based approach for multi-SNP molecular haplotyping. , 2005, Nucleic acids research.

[3]  Alun Thomas,et al.  Characterizing allelic associations from unphased diploid data by graphical modeling , 2005, Genetic epidemiology.

[4]  Sinead B. O'Leary,et al.  Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease , 2001, Nature Genetics.

[5]  Hannu Toivonen,et al.  A Markov Chain Approach to Reconstruction of Long Haplotypes , 2003, Pacific Symposium on Biocomputing.

[6]  D. Schaid Evaluating associations of haplotypes with traits , 2004, Genetic epidemiology.

[7]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[8]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[9]  Sebastian Zöllner,et al.  Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci , 2005, Genetics.

[10]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[11]  A. Clark,et al.  The role of haplotypes in candidate gene studies , 2004, Genetic epidemiology.

[12]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[13]  P. Sham,et al.  Model-Free Analysis and Permutation Tests for Allelic Associations , 1999, Human Heredity.

[14]  Hongyu Zhao,et al.  Haplotype analysis in population genetics and association studies. , 2003, Pharmacogenomics.

[15]  L. Wasserman,et al.  On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. , 2003, American journal of human genetics.

[16]  G. Abecasis,et al.  Merlin—rapid analysis of dense genetic maps using sparse gene flow trees , 2002, Nature Genetics.

[17]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[18]  P. Bühlmann,et al.  Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[19]  M Knapp,et al.  Multiple Testing in the Context of Haplotype Analysis Revisited: Application to Case‐Control Data , 2005, Annals of human genetics.

[20]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[21]  E. Boerwinkle,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping. I. Basic theory and an analysis of alcohol dehydrogenase activity in Drosophila. , 1987, Genetics.

[22]  N. Camp,et al.  Graphical modeling of the joint distribution of alleles at associated loci. , 2004, American journal of human genetics.

[23]  Aravinda Chakravarti,et al.  Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies , 2004, Nature Genetics.

[24]  D J Balding,et al.  Bayesian fine-scale mapping of disease loci, by hidden Markov models. , 2000, American journal of human genetics.

[25]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[26]  L. Excoffier,et al.  Gametic phase estimation over large genomic regions using an adaptive window approach , 2003, Human Genomics.

[27]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[28]  L. Tsui,et al.  Erratum: Identification of the Cystic Fibrosis Gene: Genetic Analysis , 1989, Science.

[29]  Andrew P Morris,et al.  Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. , 2004, American journal of human genetics.

[30]  Dan Geiger,et al.  High density linkage disequilibrium mapping using models of haplotype block variation , 2004, ISMB/ECCB.

[31]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[32]  Christopher G Mathew,et al.  Genetics of inflammatory bowel disease: progress and prospects. , 2004, Human molecular genetics.

[33]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[34]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.