A new algorithm for haplotype‐based association analysis: the Stochastic‐EM algorithm

It is now widely accepted that haplotypic information can be of great interest for investigating the role of a candidate gene in the etiology of complex diseases. In the absence of family data, haplotypes cannot be deduced from genotypes, except for individuals who are homozygous at all loci or heterozygous at only one site. Statistical methodologies are therefore required for inferring haplotypes from genotypic data and testing their association with a phenotype of interest. Two maximum likelihood algorithms are often used in the context of haplotype‐based association studies, the Newton‐Raphson (NR) and the Expectation‐Maximisation (EM) algorithms. In order to circumvent the limitations of both algorithms, including convergence to local minima and saddle points, we here described how a stochastic version of the EM algorithm, referred to as SEM, could be used for testing haplotype‐phenotype association. Statistical properties of the SEM algorithm were investigated through a simulation study for a large range of practical situations, including small/large samples and rare/frequent haplotypes, and results were compared to those obtained by use of the standard NR algorithm. Our simulation study indicated that the SEM algorithm provides results similar to those of the NR algorithm, making the SEM algorithm of great interest for haplotype‐based association analysis, especially when the number of polymorphisms is quite large.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Emile H. L. Aarts,et al.  Simulated Annealing: Theory and Applications , 1987, Mathematics and Its Applications.

[4]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[5]  Xiao-Li Meng,et al.  Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm , 1991 .

[6]  L. Tiret,et al.  Testing for association between disease and linked marker loci: a log-linear-model analysis. , 1991, American journal of human genetics.

[7]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[8]  L. Tiret,et al.  Evidence, from combined segregation and linkage analysis, that a variant of the angiotensin I-converting enzyme (ACE) gene controls plasma ACE levels. , 1992, American journal of human genetics.

[9]  G. Celeux,et al.  Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions , 1993 .

[10]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[11]  D. Chauveau A stochastic EM algorithm for mixtures with censored data , 1995 .

[12]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[13]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[14]  G. Celeux,et al.  Stochastic versions of the em algorithm: an experimental study in the mixture case , 1996 .

[15]  L. Tiret,et al.  Identification of new polymorphisms of the angiotensin I-converting enzyme (ACE) gene, and study of their relationship to plasma ACE levels by two-QTL segregation-linkage analysis. , 1996, American journal of human genetics.

[16]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[17]  Edward H. Ip,et al.  Stochastic EM: method and application , 1996 .

[18]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[19]  Peter Beighton,et al.  de la Chapelle, A. , 1997 .

[20]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[21]  D G Clayton,et al.  Fine genetic mapping using haplotype analysis and the missing data problem , 1998, Annals of human genetics.

[22]  L Tiret,et al.  Sequence diversity in 36 candidate genes for cardiovascular disorders. , 1999, American journal of human genetics.

[23]  S. Richardson,et al.  Stochastic Algorithms for Markov Models Estimation with Intermittent Missing Data , 1999, Biometrics.

[24]  L. Tiret,et al.  A parametric copula model for analysis of familial binary data. , 1999, American journal of human genetics.

[25]  M. Boehnke,et al.  Loss of information due to ambiguous haplotyping of SNPs , 1999, Nature Genetics.

[26]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[27]  R S Judson,et al.  Complex promoter and coding region beta 2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Yanfa Yan,et al.  Alloys: Atomic structure of the quasicrystal Al72Ni20Co8 , 2000, Nature.

[29]  James R. Eshleman,et al.  Conversion of diploidy to haploidy , 2000, Nature.

[30]  N. Schork,et al.  Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data. , 2000, American journal of human genetics.

[31]  I. Marschner Miscellanea On stochastic versions of the algorithm , 2001 .

[32]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[33]  M. Boehnke,et al.  Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies , 2001, Nature Genetics.

[34]  C. Sabatti,et al.  Bayesian analysis of haplotypes for linkage disequilibrium mapping. , 2001, Genome research.

[35]  Dmitri V. Zaykin,et al.  Effectiveness of computational methods in haplotype prediction , 2002, Human Genetics.

[36]  H. Li,et al.  A permutation procedure for the haplotype method for identification of disease‐predisposing variants , 2001 .

[37]  E. Mariman,et al.  Promoter haplotype combinations of the platelet-derived growth factor α-receptor gene predispose to human neural tube defects , 2001, Nature Genetics.

[38]  L. Tiret,et al.  Identification of polymorphisms in the promoter and the 3' region of the TAFI gene: evidence that plasma TAFI antigen levels are strongly genetically controlled. , 2001, Blood.

[39]  P. Sham,et al.  Faster Haplotype Frequency Estimation Using Unrelated Subjects , 2002, Human Heredity.

[40]  Peter H. Westfall,et al.  Testing Association of Statistically Inferred Haplotypes with Discrete and Continuous Traits in Samples of Unrelated Individuals , 2002, Human Heredity.

[41]  A. Chakravarti,et al.  Haplotype inference in random population samples. , 2002, American journal of human genetics.

[42]  G. Lathrop,et al.  High-resolution genetic mapping of the ACE-linked QTL influencing circulating ACE activity , 2002, European Journal of Human Genetics.

[43]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[44]  A Bayesian partition model for case‐control studies on highly polymorphic candidate genes , 2002, Genetic epidemiology.

[45]  D. Trégouët,et al.  Heterogeneity of linkage disequilibrium in human genes has implications for association studies of common diseases. , 2002, Human molecular genetics.

[46]  Tao Li,et al.  Allelic association analysis of the dopamine D2, D3, 5-HT2A, and GABA(A)gamma2 receptors and serotonin transporter genes with heroin abuse in Chinese subjects. , 2002, American journal of medical genetics.

[47]  Jean-Louis Golmard,et al.  Specific haplotypes of the P-selectin gene are associated with myocardial infarction. , 2002, Human molecular genetics.

[48]  A. Zwinderman,et al.  Haplotype analysis of the CETP gene: not TaqIB, but the closely linked -629C-->A polymorphism and a novel promoter variant are independently associated with CETP concentration. , 2003, Human molecular genetics.