Recovering unused information in genome-wide association studies: the benefit of analyzing SNPs out of Hardy–Weinberg equilibrium

Although the rapid advancements in high throughput genotyping technology have made genome-wide association studies possible, these studies remain an expensive undertaking, especially when considering the large sample sizes necessary to find the small to moderate effect sizes that define complex diseases. It is therefore prudent to utilize all possible information contained in a genome-wide scan. We propose a straightforward analytical approach that tests often unused SNP data without sacrificing statistical validity. We simulate genotype miscalls under a variety of models consistent with observed miscall rates and test for departures from HWE using the standard Pearson's χ2-test. We find that true disease susceptibility loci subjected to various patterns of genotype miscalls can be largely out of HWE and, thus, be candidates for removal before association testing. These loci, we demonstrate, can maintain sufficient statistical power even under extreme error models. We additionally show that random miscalls of null SNPs, independent of the phenotype, do not induce bias in case–control or cohort studies, and we suggest that a significant HWE test should not prevent a SNP from being tested when conducting genome-wide association studies in these scenarios. However, association findings for SNPs that are out of HWE must be treated more carefully than ‘regular’ findings, for example, by re-genotyping the SNP in the same study using a different genotyping technology.

[1]  Michael Boehnke,et al.  Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. , 2002, American journal of human genetics.

[2]  A. Donner,et al.  The Merits of Testing Hardy‐Weinberg Equilibrium in the Analysis of Unmatched Case‐Control Data: A Cautionary Note , 2006, Annals of human genetics.

[3]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[4]  Terence P. Speed,et al.  Genome analysis A genotype calling algorithm for affymetrix SNP arrays , 2005 .

[5]  Chad Haynes,et al.  The Effects of SNP Genotyping Errors on the Power of The Cochran‐Armitage Linear Trend Test for Case/Control Association Studies , 2007, Annals of human genetics.

[6]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[7]  Orla Hardiman,et al.  A genome-wide association study of sporadic ALS in a homogenous Irish population. , 2007, Human molecular genetics.

[8]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[9]  D. Gudbjartsson,et al.  Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24 , 2007, Nature Genetics.

[10]  Christian Gieger,et al.  Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions , 2007, Nature Genetics.

[11]  Jurg Ott,et al.  Assessment and management of single nucleotide polymorphism genotype errors in genetic association analysis. , 2000 .

[12]  Jacqueline K. Wittke-Thompson,et al.  Rational inferences about departures from Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[13]  Jeanette C Papp,et al.  Detection and integration of genotyping errors in statistical genetics. , 2002, American journal of human genetics.

[14]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[15]  J. Ott,et al.  A transmission/disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data. , 2001, American journal of human genetics.

[16]  Steven Gallinger,et al.  Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24 , 2007, Nature Genetics.

[17]  D. Falconer,et al.  Introduction to Quantitative Genetics. , 1961 .

[18]  Chun Li,et al.  Assessing departure from Hardy‐Weinberg equilibrium in the presence of disease association , 2008, Genetic epidemiology.

[19]  G. Zou,et al.  Statistical Methods for the Analysis of Genetic Association Studies , 2006, Annals of human genetics.

[20]  Y. Teo,et al.  On the Usage of HWE for Identifying Genotyping Errors , 2007, Annals of human genetics.

[21]  C. Gieger,et al.  Genomewide association analysis of coronary artery disease. , 2007, The New England journal of medicine.

[22]  Stephen J Finch,et al.  What SNP genotyping errors are most costly for genetic association studies? , 2004, Genetic epidemiology.

[23]  N. Morton,et al.  Hardy–Weinberg quality control , 1999, Annals of human genetics.

[24]  Wim E Crusio AN INTRODUCTION TO QUANTITATIVE GENETICS , 1998 .

[25]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[26]  P. Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24 , 2007, Nature Genetics.

[27]  Ian Purvis,et al.  Detection of genotyping errors by Hardy–Weinberg equilibrium testing , 2004, European Journal of Human Genetics.

[28]  C. Haynes,et al.  Quantifying the Percent Increase in Minimum Sample Size for SNP Genotyping Errors in Genetic Model-Based Association Studies , 2005, Human Heredity.

[29]  P. Kraft,et al.  Quantification of the Power of Hardy-Weinberg Equilibrium Testing to Detect Genotyping Error , 2006, Human Heredity.

[30]  Suzanne M Leal,et al.  Detection of genotyping errors and pseudo‐SNPs via deviations from Hardy‐Weinberg equilibrium , 2005, Genetic epidemiology.

[31]  Mario Falchi,et al.  Genome-wide Association Study Identifies Genes for Biomarkers of Cardiovascular Disease: Serum Urate and Dyslipidemia , 2022 .