Comparison of Statistical Tests for Association between Rare Variants and Binary Traits

Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait’s variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.

[1]  Jung-Ying Tzeng,et al.  Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. , 2011, American journal of human genetics.

[2]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[3]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[4]  Iuliana Ionita-Laza,et al.  A New Testing Strategy to Identify Rare Variants with Either Risk or Protective Effect on Disease , 2011, PLoS genetics.

[5]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[6]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[7]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[8]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[9]  J. Weissenbach,et al.  Mutations in PCSK9 cause autosomal dominant hypercholesterolemia , 2003, Nature Genetics.

[10]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[11]  Ryan D. Hernandez,et al.  Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome , 2008, PLoS genetics.

[12]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[13]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[14]  D. Goldstein Common genetic variation and human traits. , 2009, The New England journal of medicine.

[15]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[16]  Jonathan C. Cohen,et al.  Genetic and metabolic determinants of plasma PCSK9 levels. , 2009, The Journal of clinical endocrinology and metabolism.

[17]  Jonathan C. Cohen,et al.  A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. , 2006, American journal of human genetics.

[18]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[19]  V. Salomaa,et al.  An Increased Burden of Common and Rare Lipid-Associated Risk Alleles Contributes to the Phenotypic Spectrum of Hypertriglyceridemia , 2011, Arteriosclerosis, thrombosis, and vascular biology.

[20]  L. Abel,et al.  Incorporation of covariates in multipoint model-free linkage analysis of binary traits: how important are unaffecteds? , 2001, European Journal of Human Genetics.

[21]  Suzanne M. Leal,et al.  A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[22]  P. Visscher,et al.  Common polygenic variation contributes to risk of schizophrenia and bipolar disorder , 2009, Nature.

[23]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[24]  Hongyu Zhao,et al.  Rare independent mutations in renal salt handling genes contribute to blood pressure variation , 2008, Nature Genetics.

[25]  Pablo V Gejman,et al.  Genomewide linkage scan of 409 European-ancestry and African American families with schizophrenia: suggestive evidence of linkage at 8p23.3-p21.2 and 11p13.1-q14.1 in the combined sample. , 2006, American journal of human genetics.

[26]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[27]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[28]  M. Nelson,et al.  Comparison of methods and sampling designs to test for association between rare variants and quantitative traits , 2011, Genetic epidemiology.

[29]  Roded Sharan,et al.  Medical sequencing at the extremes of human body mass. , 2006, American journal of human genetics.

[30]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..