Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies

BackgroundPhenotype error causes reduction in power to detect genetic association. We present a quantification of phenotype error, also known as diagnostic error, on power and sample size calculations for case-control genetic association studies between a marker locus and a disease phenotype. We consider the classic Pearson chi-square test for independence as our test of genetic association. To determine asymptotic power analytically, we compute the distribution's non-centrality parameter, which is a function of the case and control sample sizes, genotype frequencies, disease prevalence, and phenotype misclassification probabilities. We derive the non-centrality parameter in the presence of phenotype errors and equivalent formulas for misclassification cost (the percentage increase in minimum sample size needed to maintain constant asymptotic power at a fixed significance level for each percentage increase in a given misclassification parameter). We use a linear Taylor Series approximation for the cost of phenotype misclassification to determine lower bounds for the relative costs of misclassifying a true affected (respectively, unaffected) as a control (respectively, case). Power is verified by computer simulation.ResultsOur major findings are that: (i) the median absolute difference between analytic power with our method and simulation power was 0.001 and the absolute difference was no larger than 0.011; (ii) as the disease prevalence approaches 0, the cost of misclassifying a unaffected as a case becomes infinitely large while the cost of misclassifying an affected as a control approaches 0.ConclusionOur work enables researchers to specifically quantify power loss and minimum sample size requirements in the presence of phenotype errors, thereby allowing for more realistic study design. For most diseases of current interest, verifying that cases are correctly classified is of paramount importance.

[1]  Jurg Ott,et al.  Assessment and management of single nucleotide polymorphism genotype errors in genetic association analysis. , 2000 .

[2]  P. Lansbury Back to the future: the 'old-fashioned' way to new medications for neurodegeneration , 2004, Nature Reviews Neuroscience.

[3]  D. Slamon,et al.  Sensitivity of HER-2/neu antibodies in archival tissue samples: potential source of error in immunohistochemical studies of oncogene expression. , 1994, Cancer research.

[4]  N. Breslow,et al.  The analysis of case-control studies , 1980 .

[5]  Mitchell H. Gail,et al.  Case-Control Studies With Errors in Covariates , 1993 .

[6]  Stephen W Duffy,et al.  Misclassification in a matched case-control study with variable matching ratio: application to a study of c-erbB-2 overexpression and breast cancer. , 2003, Statistics in medicine.

[7]  Sheryl Zimmerman,et al.  The public health impact of Alzheimer's disease, 2000-2050: potential implication of treatment advances. , 2002, Annual review of public health.

[8]  Thomas A Trikalinos,et al.  Genetic associations in large versus small studies: an empirical assessment , 2003, The Lancet.

[9]  W. G. Cochran The $\chi^2$ Test of Goodness of Fit , 1952 .

[10]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[11]  R. Anderson,et al.  AN INVESTIGATION OF THE EFFECT OF MISCLASSIFICATION ON THE PROPERTIES OF CHI-2-TESTS IN THE ANALYSIS OF CATEGORICAL DATA. , 1965, Biometrika.

[12]  J. Buzas,et al.  Power and sample size calculations for generalized regression models with covariate measurement error , 2003, Statistics in medicine.

[13]  J. Sweeney,et al.  Age As a Predictor of Treatment Response in Endogenous Depression , 1983, Journal of clinical psychopharmacology.

[14]  M. Klug,et al.  Neuropsychiatric Genetics: Misclassification in Linkage Studies of Phenotype-Genotype Research , 2001, Journal of child neurology.

[15]  Ad Appels,et al.  Imminent myocardial infarction: a psychological study. , 1984, Journal of human stress.

[16]  Chad Haynes,et al.  Increasing Power for Tests of Genetic Association in the Presence of Phenotype and/or Genotype Error by Use of Double-Sampling , 2004, Statistical applications in genetics and molecular biology.

[17]  Stephen J Finch,et al.  What SNP genotyping errors are most costly for genetic association studies? , 2004, Genetic epidemiology.

[18]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[19]  John P A Ioannidis,et al.  Genetic associations: false or true? , 2003, Trends in molecular medicine.

[20]  Sujit Kumar Mitra,et al.  On the Limiting Power Function of the Frequency Chi-Square Test , 1958 .

[21]  David B Allison,et al.  "Are we there yet?": Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. , 2003, American journal of human genetics.

[22]  J. Ott,et al.  Power and Sample Size Calculations for Case-Control Genetic Association Tests when Errors Are Present: Application to Single Nucleotide Polymorphisms , 2002, Human Heredity.

[23]  N. Breslow,et al.  Statistical methods in cancer research: volume 1- The analysis of case-control studies , 1980 .

[24]  P. Simpson,et al.  Statistical methods in cancer research , 2001, Journal of surgical oncology.

[25]  Gang Zheng,et al.  The impact of diagnostic error on testing genetic association in case–control studies , 2005, Statistics in medicine.

[26]  B. Weir,et al.  Properties of the Multiallelic Trend Test , 2004, Biometrics.

[27]  Derek Gordon,et al.  Errors and Linkage Disequilibrium Interact Multiplicatively When Computing Sample Sizes for Genetic Case-Control Association Studies , 2002, Pacific Symposium on Biocomputing.

[28]  J. Haines,et al.  Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. , 1993, Science.

[29]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[30]  David Altshuler,et al.  Once and again-issues surrounding replication in genetic association studies. , 2002, The Journal of clinical endocrinology and metabolism.

[31]  I. Bross Misclassification in 2 X 2 Tables , 1954 .

[32]  M. Spence,et al.  Analysis of human genetic linkage , 1986 .