A Mixed-Effects Model for Powerful Association Tests in Integrative Functional Genomics.

Genome-wide association studies (GWASs) have successfully identified thousands of genetic variants for many complex diseases; however, these variants explain only a small fraction of the heritability. Recently, genetic association studies that leverage external transcriptome data have received much attention and shown promise for discovering novel variants. One such approach, PrediXcan, is to use predicted gene expression through genetic regulation. However, there are limitations in this approach. The predicted gene expression may be biased, resulting from regularized regression applied to moderately sample-sized reference studies. Further, some variants can individually influence disease risk through alternative functional mechanisms besides expression. Thus, testing only the association of predicted gene expression as proposed in PrediXcan will potentially lose power. To tackle these challenges, we consider a unified mixed effects model that formulates the association of intermediate phenotypes such as imputed gene expression through fixed effects, while allowing residual effects of individual variants to be random. We consider a set-based score testing framework, MiST (mixed effects score test), and propose two data-driven combination approaches to jointly test for the fixed and random effects. We establish the asymptotic distributions, which enable rapid calculation of p values for genome-wide analyses, and provide p values for fixed and random effects separately to enhance interpretability over GWASs. Extensive simulations demonstrate that our approaches are more powerful than existing ones. We apply our approach to a large-scale GWAS of colorectal cancer and identify two genes, POU5F1B and ATF1, which would have otherwise been missed by PrediXcan, after adjusting for all known loci.

[1]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[2]  John D Potter,et al.  Estimating the heritability of colorectal cancer. , 2014, Human molecular genetics.

[3]  John P. Overington,et al.  An atlas of genetic influences on human blood metabolites , 2014, Nature Genetics.

[4]  D. Kuonen Saddlepoint approximations for distributions of quadratic forms in normal variables , 1999 .

[5]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[6]  Hongyu Zhao,et al.  GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation , 2014, PLoS genetics.

[7]  Yingye Zheng,et al.  A Unified Mixed‐Effects Model for Rare‐Variant Association in Sequencing Studies , 2013, Genetic epidemiology.

[8]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[9]  Jin Liu,et al.  EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes , 2016, Bioinform..

[10]  Derek W Wright,et al.  Gateways to the FANTOM5 promoter level mammalian expression atlas , 2015, Genome Biology.

[11]  S. Leal,et al.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[12]  O. Andreassen,et al.  All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs , 2013, PLoS genetics.

[13]  A. McKenna,et al.  Integrative eQTL-Based Analyses Reveal the Biology of Breast Cancer Risk Loci , 2013, Cell.

[14]  C. Carlson,et al.  Functional Annotation of Putative Regulatory Elements at Cancer Susceptibility Loci , 2014, Cancer informatics.

[15]  T. Lumley,et al.  FastSKAT: Sequence kernel association tests for very large sets of markers , 2018 .

[16]  Gary D Bader,et al.  Association analysis identifies 65 new breast cancer risk loci , 2017, Nature.

[17]  S. Browning,et al.  A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[18]  Mengmeng Du,et al.  A model to determine colorectal cancer risk using common genetic susceptibility loci. , 2015, Gastroenterology.

[19]  E. Eskin,et al.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies , 2014, PLoS genetics.

[20]  R. Davies The distribution of a linear combination of 2 random variables , 1980 .

[21]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[22]  Huan Liu,et al.  A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables , 2009, Comput. Stat. Data Anal..

[23]  M. Lupien,et al.  Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits , 2014, Genome research.

[24]  Zheng-Zheng Tang,et al.  A general framework for detecting disease associations with rare variants in sequencing studies. , 2011 .

[25]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[26]  Anders M. Dale,et al.  Covariate-modulated local false discovery rate for genome-wide association studies , 2014, Bioinform..

[27]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[28]  Jason H. Moore,et al.  Epigenomic Enhancer Profiling Defines a Signature of Colon Cancer , 2012, Science.

[29]  David Haussler,et al.  ENCODE whole-genome data in the UCSC genome browser (2011 update) , 2010, Nucleic Acids Res..

[30]  Max A. Horlbeck,et al.  Next-generation libraries for robust RNA interference-based genome-wide screens , 2015, Proceedings of the National Academy of Sciences.

[31]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[32]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[33]  A. Jemal,et al.  Colorectal cancer statistics, 2014 , 2014, CA: a cancer journal for clinicians.

[34]  Michael P. Epstein,et al.  A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. , 2012, American journal of human genetics.

[35]  Robert Piessens,et al.  Quadpack: A Subroutine Package for Automatic Integration , 2011 .

[36]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[37]  T. Lehtimäki,et al.  Integrative approaches for large-scale transcriptome-wide association studies , 2015, Nature Genetics.

[38]  Max A. Horlbeck,et al.  Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation , 2014, Cell.

[39]  Matthias W. Beckmann,et al.  Evidence that the 5p12 Variant rs10941679 Confers Susceptibility to Estrogen-Receptor-Positive Breast Cancer through FGF10 and MRPS30 Regulation , 2016, American journal of human genetics.

[40]  Xihong Lin,et al.  Optimal tests for rare variant effects in sequencing association studies. , 2012, Biostatistics.

[41]  Max A. Horlbeck,et al.  Parallel shRNA and CRISPR-Cas9 screens enable antiviral drug target identification , 2016, Nature chemical biology.

[42]  Naomi R. Wray,et al.  Identification of 55,000 Replicated DNA Methylation QTL , 2017 .

[43]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[44]  Anshul Kundaje,et al.  Supplementary Information for Impact of regulatory variation across human iPSCs and differentiated cells , 2017 .

[45]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[46]  N. Cox,et al.  Obesity-associated variants within FTO form long-range functional connections with IRX3 , 2014, Nature.

[47]  Jonathan K. Pritchard,et al.  The Genetic and Mechanistic Basis for Variation in Gene Regulation , 2015, PLoS genetics.

[48]  Dana B. Hancock,et al.  Comprehensive evaluation of disease- and trait-specific enrichment for eight functional elements among GWAS-identified variants , 2017, Human Genetics.

[49]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[50]  Mathieu Lemire,et al.  Enrichment of colorectal cancer associations in functional regions: Insight for using epigenomics data in the analysis of whole genome sequence-imputed GWAS data , 2017, PloS one.