Combining chains of Bayesian models with Markov melding

A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data. It may be easier to instead specify distinct submodels for each source of data, then join the submodels together. We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof. We propose chained Markov melding, an extension of Markov melding, a generic method to combine chains of submodels into a joint model. One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels. Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel. We demonstrate our methodology using two examples. The first example considers an ecological integrated population model, where multiple data are required to accurately estimate population immigration and reproduction rates. We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times. Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.

[1]  Thomas Lumley,et al.  Raking and regression calibration: Methods to address bias from correlated covariate and time‐to‐event error , 2019, Statistics in medicine.

[2]  Olivier Gimenez,et al.  Estimation of immigration rate using integrated population models , 2010 .

[3]  Xiao-Li Meng,et al.  A Trio of Inference Problems That Could Win You a Nobel Prize in Statistics (If You Help Fund It) , 2014 .

[4]  J. Sevransky,et al.  Early risk factors and the role of fluid administration in developing acute respiratory distress syndrome in septic patients , 2017, Annals of Intensive Care.

[5]  Michael J Crowther,et al.  Simulating biologically plausible complex survival data , 2013, Statistics in medicine.

[6]  Guanhua Chen,et al.  ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION. , 2020, The annals of applied statistics.

[7]  Thomas Lumley,et al.  Considerations for analysis of time‐to‐event outcomes measured with error: Bias and correction with SIMEX , 2018, Statistics in medicine.

[8]  Australia,et al.  Bayesian Survival Analysis Using the rstanarm R Package , 2020, 2002.09633.

[9]  David J. Lunn,et al.  Fully Bayesian hierarchical modelling in two stages, with application to meta-analysis , 2013, Journal of the Royal Statistical Society. Series C, Applied statistics.

[10]  Wenjie Wang,et al.  Shape-Restricted Regression Splines with R Package splines2 , 2021, Journal of Data Science.

[11]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[12]  P. Rosenberg,et al.  Hazard function estimation using B-splines. , 1995, Biometrics.

[13]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[14]  M. Schaub,et al.  Local population dynamics and the impact of scale and isolation: a study on different little owl populations , 2006 .

[15]  R. Goudie,et al.  A numerically stable algorithm for integrating Bayesian models using Markov melding , 2020, Statistics and Computing.

[16]  C. Joseph Lu,et al.  Using Degradation Measures to Estimate a Time-to-Failure Distribution , 1993 .

[17]  Dorota Kurowicka,et al.  Dependence Modeling: Vine Copula Handbook , 2010 .

[18]  Yee Whye Teh,et al.  Interoperability of statistical models in pandemic preparedness: principles and reality. , 2021, Statistical science : a review journal of the Institute of Mathematical Statistics.

[19]  Aki Vehtari,et al.  Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion) , 2019, Bayesian Analysis.

[20]  Christian Jutten,et al.  Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects , 2015, Proceedings of the IEEE.

[21]  Dan Jackson,et al.  When should meta‐analysis avoid making hidden normality assumptions? , 2018, Biometrical journal. Biometrische Zeitschrift.

[22]  Dimitris Rizopoulos,et al.  Joint models with multiple longitudinal outcomes and a time-to-event outcome: a corrected two-stage approach , 2018, Stat. Comput..

[23]  Arthur S Slutsky,et al.  Acute Respiratory Distress Syndrome The Berlin Definition , 2012 .

[24]  Adam M. Johansen,et al.  The divide-and-conquer sequential Monte Carlo algorithm: theoretical properties and limit theorems , 2021 .

[25]  P. Bromiley Products and Convolutions of Gaussian Probability Density Functions , 2013 .

[26]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[27]  Lorenz Wernisch,et al.  Joining and splitting models with Markov melding. , 2016, Bayesian analysis.

[28]  Petros Dellaportas,et al.  Efficient Sequential Monte Carlo Algorithms for Integrated Population Models , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[29]  Paul J. Birrell,et al.  Synthesising evidence to estimate pandemic (2009) A/H1N1 influenza severity in 2009-2011 , 2014, 1408.7025.

[30]  Paul C. Lambert,et al.  The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study , 2015 .

[31]  R. Aseltine,et al.  Integrative survival analysis with uncertain event times in application to a suicide risk study , 2020 .

[32]  Mevin B. Hooten,et al.  Making Recursive Bayesian Inference Accessible , 2018, The American Statistician.

[33]  Richard D Riley,et al.  Meta‐analysis using individual participant data: one‐stage and two‐stage approaches, and why they may differ , 2016, Statistics in medicine.

[34]  Dimitris Rizopoulos,et al.  Joint Models for Longitudinal and Time-to-Event Data: With Applications in R , 2012 .

[35]  Matthew Kay tidybayes: Tidy Data and Geoms for Bayesian Models , 2020 .

[36]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[37]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[38]  Aki Vehtari,et al.  Visualization in Bayesian workflow , 2017, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[39]  David R. Anderson,et al.  Modeling Survival and Testing Biological Hypotheses Using Marked Animals: A Unified Approach with Case Studies , 1992 .

[40]  Benjamin Kedem,et al.  Statistical Data Fusion , 2017 .

[41]  M. Haugh,et al.  An Introduction to Copulas , 2016 .

[42]  Xiaoyue Niu,et al.  A Bayesian hierarchical modeling approach to combining multiple data sources: A case study in size estimation , 2020, 2012.05346.

[43]  Sarah P. Saunders,et al.  Synthesizing multiple data types for biological conservation using integrated population models , 2018 .

[44]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[45]  Marc A Suchard,et al.  Reuse, Recycle, Reweigh: Combating Influenza through Efficient Sequential Bayesian Computation for Massive Data. , 2010, The annals of applied statistics.

[46]  F. Lindsten,et al.  Divide-and-Conquer With Sequential Monte Carlo , 2014, 1406.4993.

[47]  Natalia Belgorodski,et al.  Fitting Distributions to Given Data or Known Quantiles , 2015 .

[48]  Alex J. Sutton,et al.  Multiparameter evidence synthesis in epidemiology and medical decision‐making: current approaches , 2006 .

[49]  P. Royston,et al.  Flexible parametric proportional‐hazards and proportional‐odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects , 2002, Statistics in medicine.

[50]  Satoshi Kuriki,et al.  Recent developments on the construction of bivariate distributions with fixed marginals , 2014 .

[51]  Claire Donnat,et al.  A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses , 2020 .