Sequence kernel association tests for the combined effect of rare and common variants.

Recent developments in sequencing technologies have made it possible to uncover both rare and common genetic variants. Genome-wide association studies (GWASs) can test for the effect of common variants, whereas sequence-based association studies can evaluate the cumulative effect of both rare and common variants on disease risk. Many groupwise association tests, including burden tests and variance-component tests, have been proposed for this purpose. Although such tests do not exclude common variants from their evaluation, they focus mostly on testing the effect of rare variants by upweighting rare-variant effects and downweighting common-variant effects and can therefore lose substantial power when both rare and common genetic variants in a region influence trait susceptibility. There is increasing evidence that the allelic spectrum of risk variants at a given locus might include novel, rare, low-frequency, and common genetic variants. Here, we introduce several sequence kernel association tests to evaluate the cumulative effect of rare and common variants. The proposed tests are computationally efficient and are applicable to both binary and continuous traits. Furthermore, they can readily combine GWAS and whole-exome-sequencing data on the same individuals, when available, and are also applicable to deep-resequencing data of GWAS loci. We evaluate these tests on data simulated under comprehensive scenarios and show that compared with the most commonly used tests, including the burden and variance-component tests, they can achieve substantial increases in power. We next show applications to sequencing studies for Crohn disease and autism spectrum disorders. The proposed tests have been incorporated into the software package SKAT.

[1]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[2]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[3]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[4]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[5]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[6]  J. Mefford,et al.  Joint Association Testing of Common and Rare Genetic Variants Using Hierarchical Modeling , 2012, Genetic epidemiology.

[7]  Zheng-Zheng Tang,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011 .

[8]  Iuliana Ionita-Laza,et al.  A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease , 2011, PLoS genetics.

[9]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[10]  Xihong Lin,et al.  Hypothesis testing in semiparametric additive mixed models. , 2003, Biostatistics.

[11]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[12]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[13]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[14]  M. Gerstein,et al.  Annotating non-coding regions of the genome , 2010, Nature Reviews Genetics.

[15]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[16]  R. Davies Hypothesis testing when a nuisance parameter is present only under the alternative , 1977 .

[17]  Jiashun Jin,et al.  Optimal detection of heterogeneous and heteroscedastic mixtures , 2011 .

[18]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[19]  Tom R. Gaunt,et al.  Large-scale gene-centric meta-analysis across 32 studies identifies multiple lipid loci. , 2012, American journal of human genetics.

[20]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[21]  Li Hsu,et al.  An exponential combination procedure for set-based association tests in sequencing studies. , 2012, American journal of human genetics.

[22]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[23]  Morton B. Brown 400: A Method for Combining Non-Independent, One-Sided Tests of Significance , 1975 .

[24]  Jung-Ying Tzeng,et al.  Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. , 2011, American journal of human genetics.

[25]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[26]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[27]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[28]  Adam Kiezun,et al.  Exome sequencing and the genetic basis of complex traits , 2012, Nature Genetics.

[29]  Stephan Ripke,et al.  Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs , 2012, Nature Genetics.

[30]  C. O'Morain,et al.  CARD15/NOD2 mutational analysis and genotype-phenotype correlation in 612 patients with inflammatory bowel disease. , 2002, American journal of human genetics.

[31]  Iuliana Ionita-Laza,et al.  Family-based association tests for sequence data, and comparisons with population-based association tests , 2013, European Journal of Human Genetics.

[32]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[33]  B. V. van Bon,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, The New England journal of medicine.

[34]  Iuliana Ionita-Laza,et al.  Scan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets. , 2012, American journal of human genetics.

[35]  Stephan Ripke,et al.  Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. , 2013, American journal of human genetics.

[36]  Wei Pan,et al.  Comparison of statistical tests for disease association with rare variants , 2011, Genetic epidemiology.

[37]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[38]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[39]  J. Lawless,et al.  Assessment of Pooled Association Tests for Rare Genetic Variants within a Unified Framework , 2012 .

[40]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  J. Chimka Categorical Data Analysis, Second Edition , 2003 .

[43]  Eden R Martin,et al.  A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms , 2008, Genetic epidemiology.

[44]  Kathryn Roeder,et al.  Common genetic variants, acting additively, are a major source of risk for autism , 2012, Molecular Autism.

[45]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[46]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.