A Simple and Fast Two-Locus Quality Control Test to Detect False Positives Due to Batch Effects in Genome-Wide Association Studies

The impact of erroneous genotypes having passed standard quality control (QC) can be severe in genome‐wide association studies, genotype imputation, and estimation of heritability and prediction of genetic risk based on single nucleotide polymorphisms (SNP). To detect such genotyping errors, a simple two‐locus QC method, based on the difference in test statistic of association between single SNPs and pairs of SNPs, was developed and applied. The proposed approach could detect many problematic SNPs with statistical significance even when standard single SNP QC analyses fail to detect them in real data. Depending on the data set used, the number of erroneous SNPs that were not filtered out by standard single SNP QC but detected by the proposed approach varied from a few hundred to thousands. Using simulated data, it was shown that the proposed method was powerful and performed better than other tested existing methods. The power of the proposed approach to detect erroneous genotypes was ∼80% for a 3% error rate per SNP. This novel QC approach is easy to implement and computationally efficient, and can lead to a better quality of genotypes for subsequent genotype‐phenotype investigations. Genet. Epidemiol. 34:854–862, 2010. © 2010 Wiley‐Liss, Inc.

[1]  D. Weeks,et al.  Genomewide linkage study in 1,176 affected sister pair families identifies a significant susceptibility locus for endometriosis on chromosome 10q26. , 2005, American journal of human genetics.

[2]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[3]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[4]  G. Chase,et al.  The Impact of Missing and Erroneous Genotypes on Tagging SNP Selection and Power of Subsequent Association Tests , 2006, Human Heredity.

[5]  K. Strauch,et al.  Identification of probable genotyping errors by consideration of haplotypes , 2006, European Journal of Human Genetics.

[6]  Peter M Visscher,et al.  Prediction of individual genetic risk to disease from genome-wide association studies. , 2007, Genome research.

[7]  J. Chang-Claude,et al.  Impact of genotyping errors on the type I error rate and the power of haplotype-based association methods , 2009, BMC Genetics.

[8]  P. Deloukas,et al.  A variant in LIN28B is associated with 2D:4D finger-length ratio, a putative retrospective biomarker of prenatal testosterone exposure. , 2010, American journal of human genetics.

[9]  Peter M Visscher,et al.  Sizing up human height variation , 2008, Nature Genetics.

[10]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[11]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[12]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[13]  Aravinda Chakravarti,et al.  Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. , 2003, American journal of human genetics.

[14]  Pak Chung Sham,et al.  WHAP: haplotype-based association analysis , 2007, Bioinform..

[15]  Iuliana Ionita-Laza,et al.  On Quality Control Measures in Genome-Wide Association Studies: A Test to Assess the Genotyping Quality of Individual Probands in Family-Based Association Studies and an Application to the HapMap Data , 2009, PLoS genetics.

[16]  D. O'Connor,et al.  The International Endogene Study: a collection of families for genetic research in endometriosis. , 2002, Fertility and sterility.

[17]  Dmitri V Zaykin,et al.  Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. , 2006, American journal of human genetics.

[18]  T. Manolio,et al.  How to Interpret a Genome-wide Association Study Topic Collections , 2022 .

[19]  Li Jin,et al.  Missing call bias in high-throughput genotyping , 2009, BMC Genomics.

[20]  Nicole Soranzo,et al.  Quantitative trait loci for CD4:CD8 lymphocyte ratio are associated with risk of type 1 diabetes and HIV-1 immune control. , 2010, American journal of human genetics.

[21]  Sang Hong Lee,et al.  Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data , 2008, PLoS genetics.

[22]  Manuel A. R. Ferreira,et al.  Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings , 2006, PLoS genetics.