Penalized estimation of haplotype frequencies

MOTIVATION Low haplotype diversity and linkage disequilibrium are the rule in short genomic segments. This fact suggests that parsimony should be enforced in estimation of haplotype frequencies. The current article introduces a diversity penalty that automatically discards potential haplotypes with low explanatory power. The standard EM algorithm for haplotype frequency estimation can accommodate the penalty if one passes over to a more general minorize-maximize (MM) scheme for estimation. RESULTS Our new MM algorithm converges in fewer iterations, eliminates marginal haplotypes from further consideration and reduces the computational complexity of each iteration. Estimation by the MM algorithm also improves haplotyping and genotype imputation compared to naive application of the EM algorithm. Thus, the MM algorithm is a useful substitute for the EM algorithm. Compared to the most sophisticated current methods of haplotyping and genotype imputation, the MM algorithm is slightly less accurate but at least an order of magnitude faster. AVAILABILITY Our software will be made available in the next release the program Mendel at http://www.genetics.ucla.edu/software/.

[1]  Patrick J. F. Groenen,et al.  The majorization approach to multidimensional scaling : some problems and extensions , 1993 .

[2]  J. Fontcuberta-García,et al.  Angioplastia y stenting carotídeo por miniacceso cervical y flujo invertido , 2004 .

[3]  Eran Halperin,et al.  Haplotype reconstruction from genotype data using Imperfect Phylogeny , 2004, Bioinform..

[4]  J. Long,et al.  An E-M algorithm and testing strategy for multiple-locus haplotypes. , 1995, American journal of human genetics.

[5]  I. Borg,et al.  Geometric Representations of Relational Data , 1981 .

[6]  N. Morton,et al.  Estimation of haplotype frequencies. , 2008, Tissue antigens.

[7]  F. Santosa,et al.  Linear inversion of ban limit reflection seismograms , 1986 .

[8]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[11]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[12]  J. Claerbout,et al.  Robust Modeling With Erratic Data , 1973 .

[13]  Zhaohui S. Qin,et al.  Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[14]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[15]  Freda Kemp,et al.  Mathematical and Statistical Methods for Genetic Analysis , 2003 .

[16]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[17]  K. Kidd,et al.  HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes. , 1995, The Journal of heredity.

[18]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[19]  Zhaohui S. Qin,et al.  A comparison of phasing algorithms for trios and unrelated individuals. , 2006, American journal of human genetics.

[20]  H. L. Taylor,et al.  Deconvolution with the l 1 norm , 1979 .

[21]  Kenneth Lange,et al.  Mathematical and Statistical Methods for Genetic Analysis , 1997 .

[22]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[23]  K. Lange,et al.  An algorithm for automatic genotype elimination. , 1987, American journal of human genetics.

[24]  Kenneth Lange,et al.  A dictionary model for haplotyping, genotype calling, and association testing , 2007, Genetic epidemiology.