Batch effects in the BRLMM genotype calling algorithm influence GWAS results for the Affymetrix 500K array

The Affymetrix GeneChip Human Mapping 500K array is common for genome-wide association studies (GWASs). Recent findings highlight the importance of accurate genotype calling algorithms to reduce the inflation in Type I and Type II error rates. Differential results due to genotype calling errors can introduce severe bias in case–control association study results. Using data from the Wellcome Trust Case Control Consortium, 1991 individuals with coronary artery disease (CAD) and 1500 controls from the UK Blood Services (NBS) were genotyped on the Affymetrix 500K array. Different batch sizes and compositions were used in the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) genotype calling algorithm to assess the batch effect on downstream association analysis. Results show that composition (cases and controls genotyped simultaneously or separate) and size (number of individuals processed by BRLMM at a time) can create 2–3% discordance in the results for quality control and statistical analysis and may contribute to the lack of reproducibility between GWASs. The changes in batch size are largely responsible for differential single-nucleotide polymorphism results, yet we observe evidence of an interactive effect of batch size and composition that contributes to discordant results in the list of significantly associated loci.

[1]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[2]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[3]  R. Irizarry,et al.  Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays , 2008, Genome Biology.

[4]  S. Kingsmore,et al.  Genome-wide association studies: progress and potential for drug discovery and development , 2008, Nature Reviews Drug Discovery.

[5]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[6]  B. Franke,et al.  Non‐random error in genotype calling procedures: Implications for family‐based and case–control genome‐wide association studies , 2008, American Journal of Medical Genetics Part B: Neuropsychiatric Genetics.

[7]  Jing Huang,et al.  Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays , 2005, Bioinform..

[8]  Christian Gieger,et al.  Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions , 2007, Nature Genetics.

[9]  D. Clayton,et al.  A Method to Address Differential Bias in Genotyping in Large-Scale Association Studies , 2007, PLoS genetics.

[10]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.

[11]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[12]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[13]  Thomas A. Louis,et al.  Quantifying uncertainty in genotype calls , 2010, Bioinform..

[14]  S. Tsuji,et al.  Appropriate data cleaning methods for genome-wide association study , 2008, Journal of Human Genetics.

[15]  Tao Han,et al.  Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples , 2008, BMC Bioinformatics.

[16]  P. Gregersen,et al.  Genome-wide association study provides evidence for a breast cancer risk locus at 6q22.33 , 2008, Proceedings of the National Academy of Sciences.

[17]  Christian Gieger,et al.  A genome-wide association study identifies three loci associated with mean platelet volume. , 2009, American journal of human genetics.