Evaluation of the haplotype motif model using the principle of minimum description

We apply minimum description length (MDL) principles to evaluate the merit of relaxing the rigidity of block models of haplotype structure. We accomplish this by developing an MDL formulation of the more general "haplotype motif" haplotype structure similar to an approach proposed independently by Koivisto et al. [K+04]. Comparison of equivalent block and motif MDL models on real and simulated data reveal that the more flexible motif models can yield substantial reductions in data explanations, suggesting that motifs are more accurately capturing the true nature of haplotype conservation. These benefits are less pronounced in real than in simulated data, however, and depend on coverage level, marker density, and intrinsic recombination rates of specific data sets.

[1]  Esko Ukkonen,et al.  Finding Founder Sequences from a Set of Recombinants , 2002, WABI.

[2]  Heikki Mannila,et al.  An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries , 2002, Pacific Symposium on Biocomputing.

[3]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[4]  Russell Schwartz Haplotype motifs: an algorithmic approach to locating evolutionarily conserved patterns in haploid sequences , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[5]  D J Balding,et al.  Bayesian fine-scale mapping of disease loci, by hidden Markov models. , 2000, American journal of human genetics.

[6]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[7]  Simon Tavaré,et al.  Linkage disequilibrium: what history has to tell us. , 2002, Trends in genetics : TIG.

[8]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[9]  J. Wall,et al.  Assessing the performance of the haplotype block model of linkage disequilibrium. , 2003, American journal of human genetics.

[10]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[11]  Russell Schwartz,et al.  Methods for Inferring Block-Wise Ancestral History from Haploid Sequences , 2002, WABI.

[12]  David Haussler,et al.  Comparative recombination rates in the rat, mouse, and human genomes. , 2004, Genome research.

[13]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[14]  Russell Schwartz,et al.  Robustness of Inference of Haplotype Block Structure , 2003, J. Comput. Biol..

[15]  N. Freimer,et al.  Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations. , 1999, American journal of human genetics.

[16]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[17]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[18]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[19]  Heikki Mannila,et al.  Hidden Markov Modelling Techniques for Haplotype Analysis , 2004, ALT.

[20]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[21]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[23]  Richard M. Karp,et al.  Large scale reconstruction of haplotypes from genotype data , 2003, RECOMB '03.

[24]  J. Novembre,et al.  Finding haplotype block boundaries by using the minimum-description-length principle. , 2003, American journal of human genetics.

[25]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.