Impact of Missing Genotype Data on Monte-Carlo Simulation Based Haplotype Analysis

In the context of haplotype association analysis of unphased genotype data, methods based on Monte-Carlo simulations are often used to compensate for missing or inappropriate asymptotic theory. Moreover, such methods are an indispensable means to deal with multiple testing problems. We want to call attention to a potential trap in this usually useful approach: The simulation approach may lead to strongly inflated type I errors in the presence of different missing rates between cases and controls, depending on the chosen test statistic. Here, we consider four different testing strategies for haplotype analysis of case-control data. We recommend to interpret results for data sets with non-comparable distributions of missing genotypes with special caution, in case the test statistic is based on inferred haplotypes per individual. Moreover, our results are important for the conduction and interpretation of genome-wide association studies.

[1]  N. Oden Allocation of effort in Monte Carlo simulation for power of permutation tests , 1991 .

[2]  C Charles Gu,et al.  Genetic association mapping under founder heterogeneity via weighted haplotype similarity analysis in candidate genes , 2004, Genetic epidemiology.

[3]  Zhaohui S. Qin,et al.  Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[4]  Michael Knapp,et al.  Maximum‐likelihood estimation of haplotype frequencies in nuclear families , 2004, Genetic epidemiology.

[5]  Michael Knapp,et al.  A powerful strategy to account for multiple testing in the context of haplotype analysis. , 2004, American journal of human genetics.

[6]  D. Curtis,et al.  Monte Carlo tests for associations between disease and alleles at highly polymorphic loci , 1995, Annals of human genetics.

[7]  P. Hall,et al.  The Effect of Simulation Order on Level Accuracy and Power of Monte Carlo Tests , 1989 .

[8]  D. Schaid,et al.  Score tests for association between traits and haplotypes when linkage phase is ambiguous. , 2002, American journal of human genetics.

[9]  Zhaohui S. Qin,et al.  Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. , 2002, American journal of human genetics.

[10]  Aravinda Chakravarti,et al.  Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies , 2004, Nature Genetics.

[11]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[12]  Peter M Visscher,et al.  Power of direct vs. indirect haplotyping in association studies , 2004, Genetic epidemiology.