Quantifying the Percent Increase in Minimum Sample Size for SNP Genotyping Errors in Genetic Model-Based Association Studies

Kang et al. [Genet Epidemiol 2004;26:132–141] addressed the question of which genotype misclassification errors are most costly, in terms of minimum percentage increase in sample size necessary (%MSSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association in a genetic model-free setting. They answered the question for single nucleotide polymorphisms (SNPs) using the 2 × 3 χ2 test of independence. We address the same question here for a genetic model-based framework. The genetic model parameters considered are: disease model (dominant, recessive), genotypic relative risk, SNP (marker) and disease allele frequency, and linkage disequilibrium. %MSSN coefficients of each of the six possible error rates are determined by expanding the non-centrality parameter of the asymptotic distribution of the 2 × 3 χ2 test under a specified alternative hypothesis to approximate %MSSN using a linear Taylor series in the error rates. In this work we assume errors misclassifying one homozygote as another homozygote are 0, since these errors are thought to rarely occur in practice. Our findings are that there are settings of the genetic model parameters that lead to large total %MSSN for both dominant and recessive models. As SNP minor allele approaches 0, total %MSSN increases without bound, independent of other genetic model parameters. In general, %MSSN is a complex function of the genetic model parameters. Use of SNPs with small minor allele frequency requires careful attention to frequency of genotyping errors to insure that power specifications are met. Software to perform these calculations for study design is available, and an example of its use to study a disease is given.

[1]  I. Bross Misclassification in 2 X 2 Tables , 1954 .

[2]  R. Lewontin The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. , 1964, Genetics.

[3]  R. Anderson,et al.  AN INVESTIGATION OF THE EFFECT OF MISCLASSIFICATION ON THE PROPERTIES OF CHI-2-TESTS IN THE ANALYSIS OF CATEGORICAL DATA. , 1965, Biometrika.

[4]  W. G. Cochran Errors of Measurement in Statistics , 1968 .

[5]  D. Hartl,et al.  Principles of population genetics , 1981 .

[6]  R C Elston,et al.  Lods, wrods, and mods: The interpretation of lod scores calculated under different models , 1994, Genetic epidemiology.

[7]  S E Hodge,et al.  Magnitude of type I error when single-locus linkage analysis is maximized over models: a simulation study. , 1997, American journal of human genetics.

[8]  P. Sham Statistics in human genetics , 1997 .

[9]  Craig R. Miller,et al.  Assessing allelic dropout and genotype reliability using maximum likelihood. , 2002, Genetics.

[10]  J. Ott,et al.  Power and Sample Size Calculations for Case-Control Genetic Association Tests when Errors Are Present: Application to Single Nucleotide Polymorphisms , 2002, Human Heredity.

[11]  Sheryl Zimmerman,et al.  The public health impact of Alzheimer's disease, 2000-2050: potential implication of treatment advances. , 2002, Annual review of public health.

[12]  Derek Gordon,et al.  Errors and Linkage Disequilibrium Interact Multiplicatively When Computing Sample Sizes for Genetic Case-Control Association Studies , 2002, Pacific Symposium on Biocomputing.

[13]  Stephen J Finch,et al.  What SNP genotyping errors are most costly for genetic association studies? , 2004, Genetic epidemiology.