Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17

As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype‐phenotype association in next‐generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next‐generation sequencing data: type I error and false‐positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long‐range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes. Genet. Epidemiol. 35:S56–S60, 2011. © 2011 Wiley Periodicals, Inc.

[1]  Tanya M. Teslovich,et al.  Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index , 2010 .

[2]  Methods for detecting associations between phenotype and aggregations of rare variants , 2011, BMC proceedings.

[3]  H. Bickeböller,et al.  Inclusion of a priori information in genome‐wide association analysis , 2009, Genetic epidemiology.

[4]  V. Rich Personal communication , 1989, Nature.

[5]  New insights into old methods for identifying causal rare variants , 2011, BMC proceedings.

[6]  H. Hakonarson,et al.  Analysing biological pathways in genome-wide association studies , 2010, Nature Reviews Genetics.

[7]  Airat Bekmetjev,et al.  Evaluating methods for the analysis of rare variants in sequence data , 2011, BMC proceedings.

[8]  Weiliang Qiu,et al.  Combining effects from rare and common genetic variants in an exome-wide association study of sequence data , 2011, BMC proceedings.

[9]  NL Nock,et al.  Evaluating aggregate effects of rare and common variants in the 1000 Genomes Project exon sequencing data using latent variable structural equation modeling , 2011, BMC proceedings.

[10]  Pak Chung Sham,et al.  Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits , 2003, Bioinform..

[11]  Claudia Hemmelmann,et al.  Statistical analysis of rare sequence variants: an overview of collapsing methods , 2011, Genetic epidemiology.

[12]  Ashley Petersen,et al.  Evaluating methods for combining rare variant data in pathway-based tests of genetic association , 2011, BMC proceedings.

[13]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[14]  Wei Zheng,et al.  Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data , 2011, BMC proceedings.

[15]  Juan Manuel Peralta,et al.  Genetic Analysis Workshop 17 mini-exome simulation , 2011, BMC proceedings.

[16]  Laura J. Scott,et al.  Edinburgh Research Explorer Genome-wide association scan meta-analysis identifies three loci influencing adiposity and fat distribution , 2022 .

[17]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[18]  Peter Kraft,et al.  Quality control and quality assurance in genotypic data for genome‐wide association studies , 2010, Genetic epidemiology.

[19]  A. Paterson,et al.  Pathway-based joint effects analysis of rare genetic variants using Genetic Analysis Workshop 17 exon sequence data , 2011, BMC proceedings.

[20]  C. Gu,et al.  Enrichment analysis of genetic association in genes and pathways by aggregating signals from both rare and common variants , 2011, BMC proceedings.

[21]  Comparison of scoring methods for the detection of causal genes with or without rare variants , 2011, BMC proceedings.