Effect of population stratification analysis on false-positive rates for common and rare variants

Principal components analysis (PCA) has been successfully used to correct for population stratification in genome-wide association studies of common variants. However, rare variants also have a role in common disease etiology. Whether PCA successfully controls population stratification for rare variants has not been addressed. Thus we evaluate the effect of population stratification analysis on false-positive rates for common and rare variants at the single-nucleotide polymorphism (SNP) and gene level. We use the simulation data from Genetic Analysis Workshop 17 and compare false-positive rates with and without PCA at the SNP and gene level. We found that SNPs’ minor allele frequency (MAF) influenced the ability of PCA to effectively control false discovery. Specifically, PCA reduced false-positive rates more effectively in common SNPs (MAF > 0.05) than in rare SNPs (MAF < 0.01). Furthermore, at the gene level, although false-positive rates were reduced, power to detect true associations was also reduced using PCA. Taken together, these results suggest that sequence-level data should be interpreted with caution, because extremely rare SNPs may exhibit sporadic association that is not controlled using PCA.

[1]  M. Feldman,et al.  Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure , 2005, PLoS genetics.

[2]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[3]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[4]  Lisa J. Martin,et al.  Application of genetic/genomic approaches to allergic disorders. , 2010, The Journal of allergy and clinical immunology.

[5]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[6]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[7]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[8]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[9]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[10]  Claudia Hemmelmann,et al.  Statistical analysis of rare sequence variants: an overview of collapsing methods , 2011, Genetic epidemiology.

[11]  Mark D Shriver,et al.  Control of confounding of genetic associations in stratified populations. , 2003, American journal of human genetics.

[12]  Juan Manuel Peralta,et al.  Genetic Analysis Workshop 17 mini-exome simulation , 2011, BMC proceedings.

[13]  D. Nickerson,et al.  Tracing Sub-Structure in the European American Population with PCA-Informative Markers , 2008, PLoS genetics.