A Unified Framework for Association Analysis with Multiple Related Phenotypes

We consider the problem of assessing associations between multiple related outcome variables, and a single explanatory variable of interest. This problem arises in many settings, including genetic association studies, where the explanatory variable is genotype at a genetic variant. We outline a framework for conducting this type of analysis, based on Bayesian model comparison and model averaging for multivariate regressions. This framework unifies several common approaches to this problem, and includes both standard univariate and standard multivariate association tests as special cases. The framework also unifies the problems of testing for associations and explaining associations – that is, identifying which outcome variables are associated with genotype. This provides an alternative to the usual, but conceptually unsatisfying, approach of resorting to univariate tests when explaining and interpreting significant multivariate findings. The method is computationally tractable genome-wide for modest numbers of phenotypes (e.g. 5–10), and can be applied to summary data, without access to raw genotype and phenotype data. We illustrate the methods on both simulated examples, and to a genome-wide association study of blood lipid traits where we identify 18 potential novel genetic associations that were not identified by univariate analyses of the same data.

[1]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[2]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3]  A. Kshirsagar A note on the derivation of some exact multivarlate tests , 1960 .

[4]  Arnold Zellner,et al.  On the Bayesian Estimation of Multivariate Regression , 1964 .

[5]  James M. Dickey,et al.  Matricvariate Generalizations of the Multivariate $t$ Distribution and the Inverted Multivariate $t$ Distribution , 1967 .

[6]  R. Levy,et al.  Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. , 1972, Clinical chemistry.

[7]  A. Dawid Some matrix-variate distribution theory: Notational considerations and a Bayesian application , 1981 .

[8]  Jacques H. Dreze,et al.  BAYESIAN ANALYSIS OF SIMULTANEOUS EQUATION SYSTEMS , 1983 .

[9]  S. J. Press,et al.  Applied multivariate analysis : using Bayesian and frequentist methods of inference , 1984 .

[10]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[11]  C. J. Huberty,et al.  Multivariate analysis versus multiple univariate analyses. , 1989 .

[12]  S. Panini,et al.  A functional role for vimentin intermediate filaments in the metabolism of lipoprotein-derived cholesterol in human SW-13 cells. , 1992, The Journal of biological chemistry.

[13]  T. Fearn,et al.  Multivariate Bayesian variable selection and prediction , 1998 .

[14]  Michael I. Jordan Graphical Models , 2003 .

[15]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[16]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[17]  R. Kohn,et al.  Nonparametric seemingly unrelated regression , 2000 .

[18]  Bani K. Mallick,et al.  Accounting for Model Uncertainty in Seemingly Unrelated Regressions , 2002 .

[19]  M. Tsujimoto,et al.  FEEL-1, a Novel Scavenger Receptor with in Vitro Bacteria-binding and Angiogenesis-modulating Activities* , 2002, The Journal of Biological Chemistry.

[20]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[21]  D. Russell The enzymes, regulation, and genetics of bile acid synthesis. , 2003, Annual review of biochemistry.

[22]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[23]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[24]  D. Pe’er Bayesian Network Analysis of Signaling Networks: A Primer , 2005, Science's STKE.

[25]  Claudio J. Verzilli,et al.  Bayesian modelling of multivariate quantitative traits using seemingly unrelated regressions , 2005, Genetic epidemiology.

[26]  S. Goerdt,et al.  Stabilin-1, a homeostatic scavenger receptor with multiple functions , 2006, Journal of cellular and molecular medicine.

[27]  S. Robbins,et al.  Erlin-1 and erlin-2 are novel members of the prohibitin family of proteins that define lipid-raft-like domains of the ER , 2006, Journal of Cell Science.

[28]  A. Paterson,et al.  Multiple Variants in Vascular Endothelial Growth Factor (VEGFA) Are Risk Factors for Time to Severe Retinopathy in Type 1 Diabetes , 2007, Diabetes.

[29]  D. Moore,et al.  Molecular characterization of the role of orphan receptor small heterodimer partner in development of fatty liver , 2007, Hepatology.

[30]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[31]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[32]  Valen E. Johnson,et al.  Properties of Bayes Factors Based on Test Statistics , 2008 .

[33]  B. Yandell,et al.  Bayesian Quantitative Trait Loci Mapping for Multiple Traits , 2008, Genetics.

[34]  Luigi Ferrucci,et al.  Population-based genome-wide association studies reveal six loci influencing plasma levels of liver enzymes. , 2008, American journal of human genetics.

[35]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[36]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[37]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[38]  Arshad Khan,et al.  SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms , 2008, Bioinform..

[39]  Jon Wakefield,et al.  Bayes factors for genome‐wide association studies: comparison with P‐values , 2009, Genetic epidemiology.

[40]  Hong-Wen Deng,et al.  Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples , 2009, PloS one.

[41]  M. Stephens,et al.  Bayesian statistical methods for genetic association studies , 2009, Nature Reviews Genetics.

[42]  Gm Gero Walter,et al.  Bayesian linear regression , 2009 .

[43]  Manuel A. R. Ferreira,et al.  Genetics and population analysis A multivariate test of association , 2009 .

[44]  Robert J. Goodloe,et al.  Multivariate association analysis of the components of metabolic syndrome from the Framingham Heart Study , 2009, BMC proceedings.

[45]  Nikolai Petrovsky,et al.  Common sequence variation in the VEGFA gene predicts risk of diabetic retinopathy. , 2009, Investigative ophthalmology & visual science.

[46]  Matthias Heinig,et al.  New Insights into the Genetic Control of Gene Expression using a Bayesian Multi-tissue Approach , 2010, PLoS Comput. Biol..

[47]  The bovine annexin 9 gene (ANXA9) is significantly associated with milk-fat yield in a Spanish Holstein-Friesian population. , 2010, Research in veterinary science.

[48]  Tanya M. Teslovich,et al.  Biological, Clinical, and Population Relevance of 95 Loci for Blood Lipids , 2010, Nature.

[49]  M. Stephens,et al.  Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis , 2010, PLoS genetics.

[50]  M. Stephens,et al.  Bayesian variable selection regression for genome-wide association studies and other large-scale problems , 2011, 1110.6019.

[51]  Ningli Wang,et al.  Polymorphisms in the vascular endothelial growth factor gene and the risk of diabetic retinopathy in Chinese patients with type 2 diabetes , 2011, Molecular vision.

[52]  D. Vogel,et al.  Elliptical graphical modelling , 2011, 1506.04321.

[53]  Aaron Y. Lee,et al.  Common variants near FRK/COL10A1 and VEGFA are associated with advanced age-related macular degeneration , 2011, Human molecular genetics.

[54]  B. Smedsrød,et al.  Role of liver sinusoidal endothelial cells and stabilins in elimination of oxidized low-density lipoproteins. , 2011, American journal of physiology. Gastrointestinal and liver physiology.

[55]  Mathias Drton,et al.  Robust graphical modeling of gene networks using classical and alternative t-distributions , 2010, 1009.3669.

[56]  Matthew Stephens,et al.  Interactions between Glucocorticoid Treatment and Cis-Regulatory Polymorphisms Contribute to Cellular Response Phenotypes , 2011, PLoS genetics.

[57]  Qiong Yang,et al.  Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies. , 2012, Journal of probability and statistics.

[58]  Nicholas R. Lemoine,et al.  SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update) , 2012, Nucleic Acids Res..

[59]  Daniel Shriner,et al.  Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies , 2011, Front. Gene..

[60]  P. O’Reilly,et al.  MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS , 2012, PloS one.

[61]  W. Chu,et al.  Liver fat reduction with niacin is influenced by DGAT-2 polymorphisms in hypertriglyceridemic patients , 2012, Journal of Lipid Research.

[62]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .