ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies.

Set-based analysis that jointly tests the association of variants in a group has emerged as a popular tool for analyzing rare and low-frequency variants in sequencing studies. The existing set-based tests can suffer significant power loss when only a small proportion of variants are causal, and their powers can be sensitive to the number, effect sizes, and effect directions of the causal variants and the choices of weights. Here we propose an aggregated Cauchy association test (ACAT), a general, powerful, and computationally efficient p value combination method for boosting power in sequencing studies. First, by combining variant-level p values, we use ACAT to construct a set-based test (ACAT-V) that is particularly powerful in the presence of only a small number of causal variants in a variant set. Second, by combining different variant-set-level p values, we use ACAT to construct an omnibus test (ACAT-O) that combines the strength of multiple complimentary set-based tests, including the burden test, sequence kernel association test (SKAT), and ACAT-V. Through analysis of extensively simulated data and the whole-genome sequencing data from the Atherosclerosis Risk in Communities (ARIC) study, we demonstrate that ACAT-V complements the SKAT and the burden test, and that ACAT-O has a substantially more robust and higher power than those of the alternative tests.

[1]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[2]  William Valdar,et al.  Genome- and exome-wide association study of serum lipoprotein (a) in the Jackson Heart Study , 2015, Journal of Human Genetics.

[3]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[4]  Benjamin J. Wright,et al.  Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease , 2009, Nature Genetics.

[5]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[6]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[7]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[8]  Xihong Lin,et al.  The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies , 2017, Journal of the American Statistical Association.

[9]  Lee-Jen Wei,et al.  Pooled Association Tests for Rare Variants in Exon-Resequencing Studies , 2010 .

[10]  Li Hsu,et al.  An exponential combination procedure for set-based association tests in sequencing studies. , 2012, American journal of human genetics.

[11]  F. David,et al.  Statistical Estimates and Transformed Beta-Variables. , 1960 .

[12]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[13]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[14]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[15]  Gail P Jarvik,et al.  Genetic Variation in LPAL2, LPA, and PLG Predicts Plasma Lipoprotein(a) Level and Carotid Artery Disease Risk , 2011, Stroke.

[16]  Lei Sun,et al.  Robust and Powerful Tests for Rare Variants Using Fisher's Method to Combine Evidence of Association From Two or More Complementary Tests , 2013, Genetic epidemiology.

[17]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[18]  Jun Xie,et al.  Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures , 2018, Journal of the American Statistical Association.

[19]  Marcelo P. Segura-Lepe,et al.  Rare and low-frequency coding variants alter human adult height , 2016, Nature.

[20]  S. Redline,et al.  Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. , 2016, American journal of human genetics.

[21]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[22]  Seunggeun Lee,et al.  A fast and accurate algorithm to test for binary phenotypes and its application to PheWAS , 2017, bioRxiv.

[23]  A. Folsom,et al.  The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. , 1989, American journal of epidemiology.

[24]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[25]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[26]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[27]  Navin Rustagi,et al.  Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits. , 2017, American journal of human genetics.

[28]  Xiang Zhu,et al.  Bayesian large-scale multiple regression with summary statistics from genome-wide association studies , 2016, bioRxiv.

[29]  R. Collins,et al.  Genetic variants associated with Lp(a) lipoprotein level and coronary disease. , 2009, The New England journal of medicine.

[30]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.