A general framework for detecting disease associations with rare variants in sequencing studies.

Biological and empirical evidence suggests that rare variants account for a large proportion of the genetic contributions to complex human diseases. Recent technological advances in high-throughput sequencing platforms have made it possible for researchers to generate comprehensive information on rare variants in large samples. We provide a general framework for association testing with rare variants by combining mutation information across multiple variant sites within a gene and relating the enriched genetic information to disease phenotypes through appropriate regression models. Our framework covers all major study designs (i.e., case-control, cross-sectional, cohort and family studies) and all common phenotypes (e.g., binary, quantitative, and age at onset), and it allows arbitrary covariates (e.g., environmental factors and ancestry variables). We derive theoretically optimal procedures for combining rare mutations and construct suitable test statistics for various biological scenarios. The allele-frequency threshold can be fixed or variable. The effects of the combined rare mutations on the phenotype can be in the same direction or different directions. The proposed methods are statistically more powerful and computationally more efficient than existing ones. An application to a deep-resequencing study of drug targets led to a discovery of rare variants associated with total cholesterol. The relevant software is freely available.

[1]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[2]  Yun Li,et al.  Performance of Genotype Imputation for Rare Variants Identified in Exons and Flanking Regions of Genes , 2011, PloS one.

[3]  Yun Li,et al.  To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. , 2010, American journal of human genetics.

[4]  M. Spitz,et al.  Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[5]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[6]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[7]  Paul J. Rathouz,et al.  An Evolutionary Framework for Association Testing in Resequencing Studies , 2010, PLoS genetics.

[8]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[9]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[10]  Vincent Mooser,et al.  The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome , 2008, BMC cardiovascular disorders.

[11]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[12]  J. Todd,et al.  Rare Variants of IFIH1, a Gene Implicated in Antiviral Responses, Protect Against Type 1 Diabetes , 2009, Science.

[13]  Jonathan C. Cohen,et al.  Multiple Rare Alleles Contribute to Low Plasma Levels of HDL Cholesterol , 2004, Science.

[14]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[15]  D. Y. Lin Evaluating Statistical Significance in Two-Stage Genomewide Association Studies , 2006 .

[16]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[17]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[18]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[19]  Xihong Lin,et al.  Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test ( SKAT ) , 2011 .

[20]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[21]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[22]  D. Y. Lin,et al.  An efficient Monte Carlo approach to assessing statistical significance in genomic studies , 2005, Bioinform..

[23]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[24]  Jung-Ying Tzeng,et al.  Haplotype-based association analysis via variance-components score test. , 2007, American journal of human genetics.

[25]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[26]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[27]  Roded Sharan,et al.  Medical sequencing at the extremes of human body mass. , 2006, American journal of human genetics.

[28]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.