A new statistical method for haplotype reconstruction from population data.

Current routine genotyping methods typically do not provide haplotype information, which is essential for many analyses of fine-scale molecular-genetics data. Haplotypes can be obtained, at considerable cost, experimentally or (partially) through genotyping of additional family members. Alternatively, a statistical method can be used to infer phase and to reconstruct haplotypes. We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by > 50%, relative to its nearest competitor. Furthermore, our algorithm performs well in absolute terms, suggesting that reconstructing haplotypes experimentally or by genotyping additional family members may be an inefficient use of resources.

[1]  G. A. Watterson,et al.  Reversibility and the age of an allele. II. Two-allele models, with selection and mutation. , 1977, Theoretical population biology.

[2]  L. Partridge,et al.  Oxford Surveys in Evolutionary Biology , 1991 .

[3]  P. Donnelly,et al.  Partition structures, Polya urns, the Ewens sampling formula, and the ages of alleles. , 1986, Theoretical population biology.

[4]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[5]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[6]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[7]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[8]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[9]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[10]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[11]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.

[12]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[13]  R. Griffiths,et al.  Archaic African and Asian lineages in the genetic ancestry of modern humans. , 1997, American journal of human genetics.

[14]  M Kimmel,et al.  Signatures of population expansion in microsatellite repeat data. , 1998, Genetics.

[15]  E. Boerwinkle,et al.  DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene , 1998, Nature Genetics.

[16]  L. Excoffier,et al.  Incorporating genotypes of relatives into a test of linkage disequilibrium. , 1998, American journal of human genetics.

[17]  M. Rieder,et al.  Sequence variation in the human angiotensin converting enzyme , 1999, Nature Genetics.

[18]  M. Boehnke,et al.  Loss of information due to ambiguous haplotyping of SNPs , 1999, Nature Genetics.

[19]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[20]  P. Donnelly,et al.  Inference in molecular population genetics , 2000 .

[21]  Jon A Yamato,et al.  Maximum likelihood estimation of recombination rates from population data. , 2000, Genetics.

[22]  J. Felsenstein,et al.  Sampling among haplotype resolutions in a coalescent‐based genealogy sampler , 2000, Genetic Epidemiology.

[23]  M. Stephens Dealing with label switching in mixture models , 2000 .

[24]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[25]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.