Model-based inference of haplotype block variation

The uneven recombination structure of human DNA has been highlighted by several recent studies. Knowledge of the haplotype blocks generated by this phenomenon can be applied to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using real-world SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.

[1]  G. Hardy MENDELIAN PROPORTIONS IN A MIXED POPULATION. , 1908 .

[2]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[3]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[6]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[7]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[8]  C. Sing,et al.  A cladistic analysis of phenotype associations with haplotypes inferred from restriction endonuclease mapping. II. The analysis of natural populations. , 1988, Genetics.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[11]  C. Sing,et al.  A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping and DNA sequence data. III. Cladogram estimation. , 1992, Genetics.

[12]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[13]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[14]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[15]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[16]  S. Tishkoff,et al.  Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR. , 1996, Nucleic acids research.

[17]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[18]  P. Lizardi,et al.  Mutation detection and single-molecule counting using isothermal rolling-circle amplification , 1998, Nature Genetics.

[19]  M. Rieder,et al.  Sequence variation in the human angiotensin converting enzyme , 1999, Nature Genetics.

[20]  A. Jeffreys,et al.  High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot. , 2000, Human molecular genetics.

[21]  Charles M. Lieber,et al.  Direct haplotyping of kilobase-size DNA using carbon nanotube probes , 2000, Nature Biotechnology.

[22]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[23]  E. Boerwinkle,et al.  Recombinational and mutational hotspots within the human lipoprotein lipase gene. , 2000, American journal of human genetics.

[24]  E. Boerwinkle,et al.  Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism. , 2000, American journal of human genetics.

[25]  Dan Gusfield,et al.  Inference of Haplotypes from Samples of Diploid Populations: Complexity and Algorithms , 2001, J. Comput. Biol..

[26]  A. R. Wagner Molecular Biology and Evolution , 2001 .

[27]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[28]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[29]  D. Goldstein Islands of linkage disequilibrium , 2001, Nature Genetics.

[30]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[31]  A. Jeffreys,et al.  Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex , 2001, Nature Genetics.

[32]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[33]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[35]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[36]  S. Liu-Cordero Patterns of linkage disequilibrium in the human genome , 2002 .

[37]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[38]  Heikki Mannila,et al.  An MDL Method for Finding Haplotype Blocks and for Estimating the Strength of Haplotype Block Boundaries , 2002, Pacific Symposium on Biocomputing.

[39]  Russell Schwartz,et al.  Robustness of Inference of Haplotype Block Structure , 2003, J. Comput. Biol..

[40]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .