Sparse precision matrix estimation in phenotypic trait evolution models

Phylogenetic trait evolution models allow for the estimation of evolutionary correlations between a set of traits observed in a sample of related organisms. By directly modeling the evolution of the traits along an estimable phylogenetic tree, the model’s structure effectively controls for shared evolutionary history. In these models, relevant correlations are usually assessed through the high posterior density interval of their marginal distributions. However, the selected correlations alone may not provide the full picture regarding trait relationships. Their association structure, expressed through a graph that encodes partial correlations, can in contrast highlight sparsity patterns featuring direct associations between traits. In order to develop a model-based method to identify this association structure we explore the use of Gaussian graphical models (GGM) for covariance selection. We model the precision matrix with a G-Wishart conjugate prior, which results in sparse precision estimates. Furthermore the model naturally allows for Bayes Factor tests of association between the traits, with no additional computation required. We evaluate our approach through Monte Carlo simulations and ap-plications that examine the association structure and evolutionary correlations of phenotypic traits in Darwin’s finches and genomic and phenotypic traits in prokaryotes. Our approach provides accurate graph estimates and lower er-rors for the precision and correlation parameter estimates, particularly for conditionally independent traits, which are the target for sparsity in GGMs.

[1]  D. Niu,et al.  A positive correlation between GC content and growth temperature in prokaryotes , 2022, bioRxiv.

[2]  Maria De Iorio,et al.  The G-Wishart Weighted Proposal Algorithm: Efficient Posterior Computation for Gaussian Graphical Models , 2021, Journal of Computational and Graphical Statistics.

[3]  Lam Si Tung Ho,et al.  Inferring Phenotypic Trait Evolution on Large Trees With Many Incomplete Measurements , 2019, Journal of the American Statistical Association.

[4]  Donald R. Williams Bayesian Estimation for Gaussian Graphical Models: Structure Learning, Predictability, and Network Comparisons , 2018, Multivariate behavioral research.

[5]  Reza Mohammadi,et al.  Accelerating Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2017, Journal of the American Statistical Association.

[6]  Samuel J Clark,et al.  Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. , 2017, Bayesian analysis.

[7]  M. Suchard,et al.  Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models , 2019, 1912.09185.

[8]  Martin Wainwright,et al.  Handbook of Graphical Models , 2018 .

[9]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[10]  Max R. Tolkoff,et al.  Phylogenetic Factor Analysis. , 2017, Systematic biology.

[11]  J. Gaskins Hyper Markov Laws for Correlation Matrices , 2018 .

[12]  H. Massam,et al.  The ratio of normalizing constants for Bayesian graphical Gaussian model selection , 2017 .

[13]  Stéphane Robin,et al.  Inference of Adaptive Shifts for Multivariate Correlated Traits , 2017, bioRxiv.

[14]  Eugene V. Koonin,et al.  Theory of prokaryotic genome evolution , 2016, Proceedings of the National Academy of Sciences.

[15]  Miguel Verdú,et al.  Predicting microbial traits with phylogenies , 2015, The ISME Journal.

[16]  Trevor Bedford,et al.  ASSESSING PHENOTYPIC CORRELATION THROUGH THE MULTIVARIATE PHYLOGENETIC LATENT LIABILITY MODEL. , 2014, The annals of applied statistics.

[17]  Caroline Uhler,et al.  Exact formulas for the normalizing constants of Wishart distributions for graphical models , 2014, 1406.4901.

[18]  C. Ané,et al.  A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. , 2014, Systematic biology.

[19]  A. Mohammadi,et al.  Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2012, 1210.5371.

[20]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[21]  A. Doucet,et al.  Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices , 2012 .

[22]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[23]  Joseph Felsenstein,et al.  A Comparative Method for Both Discrete and Continuous Characters Using the Threshold Model , 2011, The American Naturalist.

[24]  Peter R Grant,et al.  Two developmental modules establish 3D beak-shape variation in Darwin's finches , 2011, Proceedings of the National Academy of Sciences.

[25]  Adrian Dobra,et al.  Computational Aspects Related to Inference in Gaussian Graphical Models With the G-Wishart Prior , 2011 .

[26]  N. Mitsakakis Bayesian Methods in Gaussian Graphical Models , 2010 .

[27]  James G. Scott,et al.  Objective Bayesian model selection in Gaussian graphical models , 2009 .

[28]  G'erard Letac,et al.  Wishart distributions for decomposable graphs , 2007, 0708.2380.

[29]  A. Abzhanov,et al.  The calmodulin pathway and evolution of elongated beak morphology in Darwin's finches , 2006, Nature.

[30]  A. Atay-Kayis,et al.  A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models , 2005 .

[31]  G. Bernardi,et al.  Compositional constraints and genome evolution , 2005, Journal of Molecular Evolution.

[32]  Peter R. Grant,et al.  Bmp4 and Morphological Variation of Beaks in Darwin's Finches , 2004, Science.

[33]  A. Roverato Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models , 2002 .

[34]  A. R. Merchant,et al.  High guanine–cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[35]  A. Roverato Cholesky decomposition of a hyper inverse Wishart matrix , 2000 .

[36]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[37]  Michael I. Jordan Graphical Models , 2003 .

[38]  Jun S. Liu,et al.  Covariance Structure and Convergence Rate of the Gibbs Sampler with Various Scans , 1995 .

[39]  L. M. M.-T. Theory of Probability , 1929, Nature.