Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data.

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Andriy Derkach,et al.  Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results , 2012 .

[3]  Colin B Begg,et al.  Hierarchical Modeling for Estimating Relative Risks of Rare Genetic Variants: Properties of the Pseudo‐Likelihood Method , 2011, Biometrics.

[4]  Arnab Maity,et al.  Kernel Machine SNP‐Set Testing Under Multiple Candidate Kernels , 2013, Genetic epidemiology.

[5]  B. Neale,et al.  The Weighting is the Hardest Part: On the Behavior of the Likelihood Ratio Test and the Score Test Under a Data-Driven Weighting Scheme in Sequenced Samples , 2017, Twin Research and Human Genetics.

[6]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[7]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[8]  G. Abecasis,et al.  Rare-variant association analysis: study designs and statistical tests. , 2014, American journal of human genetics.

[9]  Seunggeun Lee,et al.  General framework for meta-analysis of rare variants in sequencing association studies. , 2013, American journal of human genetics.

[10]  J. Meigs,et al.  Sequence Kernel Association Test for Quantitative Traits in Family Samples , 2013, Genetic epidemiology.

[11]  Yingye Zheng,et al.  A Unified Mixed‐Effects Model for Rare‐Variant Association in Sequencing Studies , 2013, Genetic epidemiology.

[12]  Dajiang J. Liu,et al.  Meta-Analysis of Gene Level Tests for Rare Variant Association , 2013, Nature Genetics.

[13]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[14]  R. Elston,et al.  Genetic Epidemiology Research Article a Generalized Genetic Random Field Method for the Genetic Association Analysis of Sequencing Data , 2022 .

[15]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[16]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[17]  Eric S. Lander,et al.  A polygenic burden of rare disruptive mutations in schizophrenia , 2014, Nature.

[18]  Pardis C Sabeti,et al.  Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay , 2016, Cell.

[19]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[20]  Lei Sun,et al.  Robust and Powerful Tests for Rare Variants Using Fisher's Method to Combine Evidence of Association From Two or More Complementary Tests , 2013, Genetic epidemiology.

[21]  Bjarni V. Halldórsson,et al.  Large-scale whole-genome sequencing of the Icelandic population , 2015, Nature Genetics.

[22]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[23]  Duncan C Thomas,et al.  The use of hierarchical models for estimating relative risks of individual genetic variants: An application to a study of melanoma , 2008, Statistics in medicine.

[24]  M. Rieder,et al.  Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. , 2012, American journal of human genetics.

[25]  S. Gabriel,et al.  Calibrating a coalescent simulation of human genome sequence variation. , 2005, Genome research.

[26]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[27]  Bhramar Mukherjee,et al.  Set‐based tests for genetic association in longitudinal studies , 2015, Biometrics.

[28]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[29]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[30]  T. Ferrin,et al.  Genetic and Epigenetic Regulation of the Organic Cation Transporter 3, SLC22A3 , 2011, The Pharmacogenomics Journal.

[31]  B. Neale,et al.  The weighting is the hardest part: on the behavior of the likelihood ratio test and score test under weight misspecification in rare variant association studies , 2015, bioRxiv.

[32]  Xihong Lin,et al.  GEE‐Based SNP Set Association Test for Continuous and Discrete Traits in Family‐Based Association Studies , 2013, Genetic epidemiology.

[33]  Yun Li,et al.  The Value of Statistical or Bioinformatics Annotation for Rare Variant Association With Quantitative Trait , 2013, Genetic epidemiology.

[34]  Benjamin J. Wright,et al.  Genome-wide haplotype association study identifies the SLC22A3-LPAL2-LPA gene cluster as a risk locus for coronary artery disease , 2009, Nature Genetics.

[35]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[36]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.