Constrained hidden Markov models for population-based haplotyping

BackgroundHaplotype Reconstruction is the problem of resolving the hidden phase information in genotype data obtained from laboratory measurements. Solving this problem is an important intermediate step in gene association studies, which seek to uncover the genetic basis of complex diseases. We propose a novel approach for haplotype reconstruction based on constrained hidden Markov models. Models are constructed by incrementally refining and regularizing the structure of a simple generative model for genotype data under Hardy-Weinberg equilibrium.ResultsThe proposed method is evaluated on real-world and simulated population data. Results show that it is competitive with other recently proposed methods in terms of reconstruction accuracy, while offering a particularly good trade-off between computational costs and quality of results for large datasets.ConclusionRelatively simple probabilistic approaches for haplotype reconstruction based on structured hidden Markov models are competitive with more complex, well-established techniques in this field.

[1]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[2]  Heikki Mannila,et al.  A Hidden Markov Technique for Haplotype Reconstruction , 2005, WABI.

[3]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[4]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[6]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[7]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[8]  A. Dunker The pacific symposium on biocomputing , 1998 .

[9]  Hannu Toivonen,et al.  HaploRec: efficient and accurate large-scale reconstruction of haplotypes , 2006, BMC Bioinformatics.

[10]  Shibu Yooseph,et al.  A Survey of Computational Methods for Determining Haplotypes , 2002, Computational Methods for SNPs and Haplotype Inference.

[11]  Hannu Toivonen,et al.  A Markov Chain Approach to Reconstruction of Long Haplotypes , 2003, Pacific Symposium on Biocomputing.

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  T Varilo,et al.  Molecular genetics of the Finnish disease heritage. , 1999, Human molecular genetics.

[15]  Jennifer Wessel,et al.  A comprehensive literature review of haplotyping software and methods for use with unrelated individuals , 2005, Human Genomics.

[16]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[17]  Ron Shamir,et al.  A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association , 2005, J. Comput. Biol..

[18]  M. Olivier A haplotype map of the human genome , 2003, Nature.