Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies

BackgroundUntil recently, genome-wide association studies (GWAS) have been restricted to research groups with the budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping with genotyping of DNA pools in Phase I of a GWAS has proven successful, and dramatically altered the financial feasibility of this approach. When conducting a pool-based GWAS, how well SNP allele frequency is estimated from a DNA pool will influence a study's power to detect associations. Here we address how to control the variance in allele frequency estimation when DNAs are pooled, and how to plan and conduct the most efficient well-powered pool-based GWAS.MethodsBy examining the variation in allele frequency estimation on SNP arrays between and within DNA pools we determine how array variance [var(earray)] and pool-construction variance [var(econstruction)] contribute to the total variance of allele frequency estimation. This information is useful in deciding whether replicate arrays or replicate pools are most useful in reducing variance. Our analysis is based on 27 DNA pools ranging in size from 74 to 446 individual samples, genotyped on a collective total of 128 Illumina beadarrays: 24 1M-Single, 32 1M-Duo, and 72 660-Quad.ResultsFor all three Illumina SNP array types our estimates of var(earray) were similar, between 3-4 × 10-4 for normalized data. Var(econstruction) accounted for between 20-40% of pooling variance across 27 pools in normalized data.ConclusionsWe conclude that relative to var(earray), var(econstruction) is of less importance in reducing the variance in allele frequency estimation from DNA pools; however, our data suggests that on average it may be more important than previously thought. We have prepared a simple online tool, PoolingPlanner (available at http://www.kchew.ca/PoolingPlanner/), which calculates the effective sample size (ESS) of a DNA pool given a range of replicate array values. ESS can be used in a power calculator to perform pool-adjusted calculations. This allows one to quickly calculate the loss of power associated with a pooling experiment to make an informed decision on whether a pool-based GWAS is worth pursuing.

[1]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[2]  Stuart Macgregor,et al.  Most pooling variation in array-based DNA pooling is attributable to array error rather than pool construction error , 2007, European Journal of Human Genetics.

[3]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[4]  Philippe Rigault,et al.  A novel, high-performance random array platform for quantitative gene expression profiling. , 2004, Genome research.

[5]  Eran Halperin,et al.  Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma , 2009, Nature Genetics.

[6]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[7]  T. Manolio,et al.  How to Interpret a Genome-wide Association Study Topic Collections , 2022 .

[8]  Richard C Trembath,et al.  Identification of ZNF313/RNF114 as a novel psoriasis susceptibility gene. , 2008, Human molecular genetics.

[9]  Ingeborg Dhooge,et al.  A genome-wide analysis identifies genetic variants in the RELN gene associated with otosclerosis. , 2009, American journal of human genetics.

[10]  J. Hirschhorn Genomewide association studies--illuminating biologic pathways. , 2009, The New England journal of medicine.

[11]  Donald W. Bowden,et al.  Candidate genes for non-diabetic ESRD in African Americans: a genome-wide association study using pooled DNA , 2010, Human Genetics.

[12]  Peter M. Visscher,et al.  Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates , 2006, Nucleic acids research.

[13]  Nils Homer,et al.  Common sequence variants on 20q11.22 confer melanoma susceptibility , 2008, Nature Genetics.

[14]  M. O’Donovan,et al.  DNA Pooling: a tool for large-scale association studies , 2002, Nature Reviews Genetics.

[15]  D. Clayton,et al.  Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. , 2002, Annals of human genetics.

[16]  D. Cox,et al.  A genomewide association study of skin pigmentation in a South Asian population. , 2007, American journal of human genetics.

[17]  Sven Cichon,et al.  A genome-wide association study for late-onset Alzheimer's disease using DNA pooling , 2008, BMC Medical Genomics.

[18]  K. Gunderson,et al.  Whole genome genotyping technologies on the BeadArray™ platform , 2007 .

[19]  D. Craig,et al.  Identification of a Novel Risk Locus for Multiple Sclerosis at 13q31.3 by a Pooled Genome-Wide Scan of 500,000 Single Nucleotide Polymorphisms , 2008, PloS one.

[20]  P. Visscher,et al.  Rapid inexpensive genome-wide association using pooled whole blood. , 2009, Genome research.

[21]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[22]  P. Visscher,et al.  Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays , 2008, Nucleic acids research.

[23]  P. Visscher,et al.  Simple method to analyze SNP‐based association studies using DNA pools , 2003, Genetic epidemiology.

[24]  G. V. Ommen,et al.  Medical genomics , 2001, European Journal of Human Genetics.

[25]  P. Sham,et al.  Impact and Quantification of the Sources of Error in DNA Pooling Designs , 2009, Annals of human genetics.

[26]  Rebecca F. Halperin,et al.  Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. , 2007, American journal of human genetics.