A Powerful Method for Including Genotype Uncertainty in Tests of Hardy-Weinberg Equilibrium

The use of posterior probabilities to summarize genotype uncertainty is pervasive across genotype, sequencing and imputation platforms. Prior work in many contexts has shown the utility of incorporating genotype uncertainty (posterior probabilities) in downstream statistical tests. Typical approaches to incorporating genotype uncertainty when testing Hardy-Weinberg equilibrium tend to lack calibration in the type I error rate, especially as genotype uncertainty increases. We propose a new approach in the spirit of genomic control that properly calibrates the type I error rate, while yielding improved power to detect deviations from Hardy-Weinberg Equilibrium. We demonstrate the improved performance of our method on both simulated and real genotypes.

[1]  Nathan L. Tintle,et al.  Assessing the Impact of Differential Genotyping Errors on Rare Variant Tests of Association , 2013, PloS one.

[2]  Stephen J Finch,et al.  Factors affecting statistical power in the detection of genetic association. , 2005, The Journal of clinical investigation.

[3]  Jan Graffelman,et al.  Exact Inference for Hardy-Weinberg Proportions with Missing Genotypes: Single and Multiple Imputation , 2015, G3: Genes, Genomes, Genetics.

[4]  Paul Scheet,et al.  A comparison of approaches to account for uncertainty in analysis of imputed genotypes , 2011, Genetic epidemiology.

[5]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[6]  Nathan Tintle,et al.  Assessing the Impact of Non-Differential Genotyping Errors on Rare Variant Tests of Association , 2011, Human Heredity.

[7]  Joshua S. Paul,et al.  Genotype and SNP calling from next-generation sequencing data , 2011, Nature Reviews Genetics.

[8]  Impact of Hardy–Weinberg disequilibrium on post-imputation quality control , 2013, Human Genetics.

[9]  Peter Holmans,et al.  Effects of Differential Genotyping Error Rate on the Type I Error Probability of Case-Control Studies , 2006, Human Heredity.

[10]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[11]  Donald B. Rubin,et al.  Significance levels from repeated p-values with multiply imputed data , 1991 .

[12]  Weida Tong,et al.  Evaluating variations of genotype calling: a potential source of spurious associations in genome-wide association studies , 2010, Journal of Genetics.

[13]  Z. Li,et al.  Testing Hardy‐Weinberg Equilibrium using Family Data from Complex Surveys , 2009, Annals of human genetics.

[14]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[15]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[16]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[17]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[18]  N. Tintle,et al.  Optimal Methods for Using Posterior Probabilities in Association Testing , 2013, Human Heredity.

[19]  Christine Licht,et al.  New methods for generating significance levels from multiply-imputed data , 2010 .

[20]  D. Shriner Approximate and exact tests of Hardy‐Weinberg equilibrium using uncertain genotypes , 2011, Genetic epidemiology.

[21]  Stephen J Finch,et al.  Using Duplicate Genotyped Data in Genetic Analyses: Testing Association and Estimating Error Rates , 2007, Statistical applications in genetics and molecular biology.

[22]  Regression Modeling of Allele Frequencies and Testing Hardy Weinberg Equilibrium , 2013, Human Heredity.

[23]  Jian Wang,et al.  Testing Hardy-Weinberg Proportions in a Frequency-Matched Case-Control Genetic Association Study , 2011, PloS one.

[24]  Yan Li A comparison of tests for Hardy-Weinberg Equilibrium in national genetic household surveys , 2013, BMC Genetics.