The Power of Gene-Based Rare Variant Methods to Detect Disease-Associated Variation and Test Hypotheses About Complex Disease

Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.

[1]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[2]  V. Salomaa,et al.  Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia , 2010, Nature Genetics.

[3]  Yingye Zheng,et al.  A Unified Mixed‐Effects Model for Rare‐Variant Association in Sequencing Studies , 2013, Genetic epidemiology.

[4]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[5]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[6]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[7]  Eleftheria Zeggini,et al.  Rare variant association analysis methods for complex traits. , 2010, Annual review of genetics.

[8]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[9]  Emmanouil Collab A map of human genome variation from population-scale sequencing , 2011, Nature.

[10]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[11]  Adam Kiezun,et al.  Exome sequencing and the genetic basis of complex traits , 2012, Nature Genetics.

[12]  Jonathan C. Cohen,et al.  Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. , 2006, The New England journal of medicine.

[13]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[14]  Vikas Bansal,et al.  An Application and Empirical Comparison of Statistical Analysis Methods for Associating Rare Variants to a Complex Phenotype , 2011, Pacific Symposium on Biocomputing.

[15]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[16]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[17]  Pak Chung Sham,et al.  Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits , 2003, Bioinform..

[18]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[19]  Xihong Lin,et al.  Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test ( SKAT ) , 2011 .

[20]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[21]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[22]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[23]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[24]  Inês Barroso,et al.  Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes , 2012, Nature Genetics.

[25]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[26]  Yurii S. Aulchenko,et al.  The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals , 2012, PLoS genetics.

[27]  Jason Flannick,et al.  Evaluating empirical bounds on complex disease genetic architecture , 2013, Nature Genetics.

[28]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[29]  D. Falconer The inheritance of liability to diseases with variable age of onset, with particular reference to diabetes mellitus , 1967, Annals of human genetics.

[30]  P. Sham,et al.  Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases , 2011, Genetic epidemiology.

[31]  Adam Kiezun,et al.  Computational and statistical approaches to analyzing variants identified by exome sequencing , 2011, Genome Biology.

[32]  Søren Brunak,et al.  Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. , 2013, American journal of human genetics.