Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.

Molecular techniques allow the survey of a large number of linked polymorphic loci in random samples from diploid populations. However, the gametic phase of haplotypes is usually unknown when diploid individuals are heterozygous at more than one locus. To overcome this difficulty, we implement an expectation-maximization (EM) algorithm leading to maximum-likelihood estimates of molecular haplotype frequencies under the assumption of Hardy-Weinberg proportions. The performance of the algorithm is evaluated for simulated data representing both DNA sequences and highly polymorphic loci with different levels of recombination. As expected, the EM algorithm is found to perform best for large samples, regardless of recombination rates among loci. To ensure finding the global maximum likelihood estimate, the EM algorithm should be started from several initial conditions. The present approach appears to be useful for the analysis of nuclear DNA sequences or highly variable loci. Although the algorithm, in principle, can accommodate an arbitrary number of loci, there are practical limitations because the computing time grows exponentially with the number of polymorphic loci. Although the algorithm, in principle, can accommodate an arbitrary number of loci, there are practical limitations because the computing time grows exponentially with the number of polymorphic loci.

[1]  C. A. Smith,et al.  THE ESTIMATION OF GENE FREQUENCIES IN A RANDOM‐MATING POPULATION , 1955, Annals of human genetics.

[2]  C. A. Smith Counting methods in genetical statistics. , 1957, Annals of human genetics.

[3]  Cedric A. B. Smith,et al.  COUNTING METHODS IN GENETICAL STATISTICS , 1957 .

[4]  Regina C. Elandt-Johnson,et al.  Probability models and statistical methods in genetics , 1972 .

[5]  W. G. Hill,et al.  Estimation of linkage disequilibrium in randomly mating populations , 1974, Heredity.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  N. Yasuda Estimation of haplotype frequency and linkage disequilibrium parameter in the HLA system. , 1978, Tissue antigens.

[8]  B. K. Pal,et al.  Allele-specific enzymatic amplification of beta-globin genomic DNA for diagnosis of sickle cell anemia. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  C Summers,et al.  Analysis of any point mutation in DNA. The amplification refractory mutation system (ARMS). , 1989, Nucleic acids research.

[10]  Bruce S. Weir,et al.  Genetic Data Analysis: Methods for Discrete Population Genetic Data. , 1991 .

[11]  K K Kidd,et al.  Haplotype of multiple polymorphisms resolved by enzymatic amplification of single DNA molecules. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[13]  M. Slatkin,et al.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. , 1991, Genetics.

[14]  John C. Avise,et al.  Molecular Markers, Natural History, and Evolution , 1993 .

[15]  M W Perlin,et al.  Toward fully automated genotyping: allele assignment, pedigree construction, phase determination, and recombination detection in Duchenne muscular dystrophy. , 1994, American journal of human genetics.

[16]  M. Slatkin Linkage disequilibrium in growing and stable populations. , 1994, Genetics.

[17]  John C. Avise Molecular Markers, Natural History and Evolution , 1994, Springer US.