Notes on the Maximum Likelihood Estimation of Haplotype Frequencies

The maximum likelihood estimation (MLE) is one of the most popular ways to estimate haplotype frequencies of a population with genotype data whose linkage phases are unknown. The MLE is commonly implemented in the use of the Expectation‐Maximization (EM) algorithm. It is known that the EM algorithm carries the risk that an estimator may converge erroneously to one of the local maxima or saddle points of the likelihood surface, resulting in serious errors in the MLE of haplotype frequencies. In this note, by theoretical treatments we present the necessary and sufficient conditions that the local maxima or saddle points on the likelihood surface appear. As a rule of thumb, that the difference between the coupling and repulsive haplotype frequencies in phase known individuals is 3/2 times larger than the frequency of phase ambiguous individuals is the sufficient condition that the likelihood surface is unimodal. Moreover, we present the analytic solution to the biallelic two‐locus problem, and construct a general algorithm to obtain the global maximum.

[1]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[2]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[3]  G. H. Freeman,et al.  Estimation of linkage disequilibrium in randomly mating populations1 , 1979, Heredity.

[4]  N. Yasuda The sampling variance of the linkage disequilibrium parameter in multi-allele loci , 1978, Heredity.

[5]  B S Weir,et al.  Independence tests for VNTR alleles defined as quantile bins. , 1993, American journal of human genetics.

[6]  W. Bodmer,et al.  Human genetics: the molecular challenge. , 1987, BioEssays : news and reviews in molecular, cellular and developmental biology.

[7]  M. Nei,et al.  Non-random association between electromorphs and inversion chromosomes in finite populations. , 1980, Genetical research.

[8]  W. G. Hill,et al.  Estimation of linkage disequilibrium in randomly mating populations , 1974, Heredity.

[9]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[10]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[11]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[12]  J. Stephens,et al.  Haplotype Variation and Linkage Disequilibrium in 313 Human Genes , 2001, Science.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  E S Lander,et al.  Mapping complex genetic traits in humans: new methods using a complete RFLP linkage map. , 1986, Cold Spring Harbor symposia on quantitative biology.

[15]  K K Kidd,et al.  Comparisons of two methods for haplotype reconstruction and haplotype frequency estimation from population data. , 2001, American journal of human genetics.

[16]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[17]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[18]  P. Sham Statistics in human genetics , 1997 .

[19]  J. Ott Genetic data analysis II , 1997 .

[20]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.