Association Testing of Clustered Rare Causal Variants in Case-Control Studies

Biological evidence suggests that multiple causal variants in a gene may cluster physically. Variants within the same protein functional domain or gene regulatory element would locate in close proximity on the DNA sequence. However, spatial information of variants is usually not used in current rare variant association analyses. We here propose a clustering method (abbreviated as “CLUSTER”), which is extended from the adaptive combination of P-values. Our method combines the association signals of variants that are more likely to be causal. Furthermore, the statistic incorporates the spatial information of variants. With extensive simulations, we show that our method outperforms several commonly-used methods in many scenarios. To demonstrate its use in real data analyses, we also apply this CLUSTER test to the Dallas Heart Study data. CLUSTER is among the best methods when the effects of causal variants are all in the same direction. As variants located in close proximity are more likely to have similar impact on disease risk, CLUSTER is recommended for association testing of clustered rare causal variants in case-control studies.

[1]  Ronald M Peshock,et al.  The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. , 2004, The American journal of cardiology.

[2]  Nengjun Yi,et al.  Haplotype‐Based Methods for Detecting Uncommon Causal Variants With Common SNPs , 2012, Genetic epidemiology.

[3]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[4]  J. Pritchard,et al.  The allelic architecture of human disease genes: common disease-common variant...or not? , 2002, Human molecular genetics.

[5]  P. Talmud,et al.  ANGPTL4 E40K and T266M: Effects on Plasma Triglyceride and HDL Levels, Postprandial Responses, and CHD Risk , 2008, Arteriosclerosis, thrombosis, and vascular biology.

[6]  Nengjun Yi,et al.  Bayesian analysis of rare variants in genetic association studies , 2011, Genetic epidemiology.

[7]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[8]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[9]  J. Pritchard Are rare variants responsible for susceptibility to complex diseases? , 2001, American journal of human genetics.

[10]  Andrew B. Lawson,et al.  Statistical Methods for Disease Clustering , 2010 .

[11]  Christoph Lange,et al.  ‘Location, Location, Location’: a spatial approach for rare variant analysis and an application to a study on non-syndromic cleft lip with or without cleft palate , 2012, Bioinform..

[12]  Yun Li,et al.  To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. , 2010, American journal of human genetics.

[13]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[14]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[15]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[16]  Eric Boerwinkle,et al.  Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. , 2008, The Journal of clinical investigation.

[17]  T. Tango,et al.  A test for spatial disease clustering adjusted for multiple testing. , 2000, Statistics in medicine.

[18]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[19]  Shamil R Sunyaev,et al.  Pooled association tests for rare variants in exon-resequencing studies. , 2010, American journal of human genetics.

[20]  Shuang Wang,et al.  A Fast and Noise‐Resilient Approach to Detect Rare‐Variant Associations With Deep Sequencing Data for Complex Disorders , 2012, Genetic epidemiology.

[21]  Jason P. Sinnwell,et al.  Detecting genomic clustering of risk variants from sequence data: cases versus controls , 2013, Human Genetics.

[22]  Nengjun Yi,et al.  Haplotype Kernel Association Test as a Powerful Method to Identify Chromosomal Regions Harboring Uncommon Causal Variants , 2013, Genetic epidemiology.

[23]  Richard R. Hudson,et al.  Generating samples under a Wright-Fisher neutral model of genetic variation , 2002, Bioinform..

[24]  V. Bansal,et al.  Statistical analysis strategies for association studies involving rare variants , 2010, Nature Reviews Genetics.

[25]  Iuliana Ionita-Laza,et al.  Scan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets. , 2012, American journal of human genetics.

[26]  Eric Boerwinkle,et al.  Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL , 2007, Nature Genetics.

[27]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[28]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[29]  Nianjun Liu,et al.  Rare Variant Association Testing by Adaptive Combination of P-values , 2014, PloS one.

[30]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[31]  Jesse R. Raab,et al.  Insulators and promoters: closer than we think , 2010, Nature Reviews Genetics.

[32]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[33]  T Tango,et al.  The detection of disease clustering in time. , 1984, Biometrics.

[34]  R. Davies The distribution of a linear combination of 2 random variables , 1980 .

[35]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[36]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[37]  Jocelyn E. Krebs,et al.  Lewin's Genes X , 2009 .

[38]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[39]  P. Talmud,et al.  ANGPTL4 variants E40K and T266M are associated with lower fasting triglyceride levels in Non-Hispanic White Americans from the Look AHEAD Clinical Trial , 2011, BMC Medical Genetics.

[40]  P. Rosenberg,et al.  Pathway analysis by adaptive combination of P‐values , 2009, Genetic epidemiology.