A Unified Mixed‐Effects Model for Rare‐Variant Association in Sequencing Studies

For rare‐variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant‐specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation‐based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust, and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies.

[1]  Ronald M Peshock,et al.  The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. , 2004, The American journal of cardiology.

[2]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[3]  Wei Pan,et al.  A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants , 2010, Human Heredity.

[4]  John S Witte,et al.  Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[5]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[6]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[7]  Huan Liu,et al.  A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables , 2009, Comput. Stat. Data Anal..

[8]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[9]  Paul J. Rathouz,et al.  An Evolutionary Framework for Association Testing in Resequencing Studies , 2010, PLoS genetics.

[10]  W. Thilly,et al.  A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). , 2007, Mutation research.

[11]  Dawei Liu,et al.  Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models , 2008, BMC Bioinformatics.

[12]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[13]  Dan-Yu Lin,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011, American journal of human genetics.

[14]  E. Zeggini,et al.  An Evaluation of Statistical Approaches to Rare Variant Analysis in Genetic Association Studies , 2009, Genetic epidemiology.

[15]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[16]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[17]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[18]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[19]  Xihong Lin,et al.  Hypothesis testing in semiparametric additive mixed models. , 2003, Biostatistics.

[20]  Xihong Lin Variance component testing in generalised linear models with random effects , 1997 .

[21]  Colin B Begg,et al.  Hierarchical Modeling for Estimating Relative Risks of Rare Genetic Variants: Properties of the Pseudo‐Likelihood Method , 2011, Biometrics.

[22]  Michael D. Perlman,et al.  Combining Independent Chi-Squared Tests , 1978 .

[23]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[24]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[25]  Gary K. Chen,et al.  Enriching the analysis of genomewide association studies with hierarchical modeling. , 2007, American journal of human genetics.

[26]  Shamil R Sunyaev,et al.  Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. , 2007, American journal of human genetics.

[27]  Jacob A. Tennessen,et al.  Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes , 2012, Science.

[28]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.