Spoiling the whole bunch: quality control aimed at preserving the integrity of high-throughput genotyping.

False-positive or false-negative results attributable to undetected genotyping errors and confounding factors present a constant challenge for genome-wide association studies (GWAS) given the low signals associated with complex phenotypes and the noise associated with high-throughput genotyping. In the context of the genetics of kidneys in diabetes (GoKinD) study, we identify a source of error in genotype calling and demonstrate that a standard battery of quality-control (QC) measures is not sufficient to detect and/or correct it. We show that, if genotyping and calling are done by plate (batch), even a few DNA samples of marginally acceptable quality can profoundly alter the allele calls for other samples on the plate. In turn, this leads to significant differential bias in estimates of allele frequency between plates and, potentially, to false-positive associations, particularly when case and control samples are not sufficiently randomized to plates. This problem may become widespread as investigators tap into existing public databases for GWAS control samples. We describe how to detect and correct this bias by utilizing additional sources of information, including raw signal-intensity data.

[1]  John R Thompson,et al.  Biostatistical Aspects of Genome‐Wide Association Studies , 2008, Biometrical journal. Biometrische Zeitschrift.

[2]  Stephen J Finch,et al.  Increase of Rejection Rate in Case-Control Studies with the Differential Genotyping Error Rates , 2009, Statistical applications in genetics and molecular biology.

[3]  Stephen J Finch,et al.  Using Duplicate Genotyped Data in Genetic Analyses: Testing Association and Estimating Error Rates , 2007, Statistical applications in genetics and molecular biology.

[4]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[5]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[6]  Peter Holmans,et al.  Effects of Differential Genotyping Error Rate on the Type I Error Probability of Case-Control Studies , 2006, Human Heredity.

[7]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[8]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.

[9]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[11]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[12]  M. Weale Quality control for genome-wide association studies. , 2010, Methods in molecular biology.

[13]  D. Clayton,et al.  A Method to Address Differential Bias in Genotyping in Large-Scale Association Studies , 2007, PLoS genetics.

[14]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[15]  Derek Gordon,et al.  The Cost Effectiveness of Duplicate Genotyping for Testing Genetic Association , 2009, Annals of human genetics.

[16]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[17]  P. Donnelly,et al.  New models of collaboration in genome-wide association studies: the Genetic Association Information Network , 2007, Nature Genetics.

[18]  Ke Hao,et al.  Incorporating Individual Error Rate into Association Test of Unmatched Case-Control Design , 2005, Human Heredity.