Missing data analysis and imputation via latent Gaussian Markov random fields

In this paper we recast the problem of missing values in the covariates of a regression model as a latent Gaussian Markov random field (GMRF) model in a fully Bayesian framework. Our proposed approach is based on the definition of the covariate imputation sub-model as a latent effect with a GMRF structure. We show how this formulation works for continuous covariates and provide some insight on how this could be extended to categorical covariates. The resulting Bayesian hierarchical model naturally fits within the integrated nested Laplace approximation (INLA) framework, which we use for model fitting. Hence, our work fills an important gap in the INLA methodology as it allows to treat models with missing values in the covariates. As in any other fully Bayesian framework, by relying on INLA for model fitting it is possible to formulate a joint model for the data, the imputed covariates and their missingness mechanism. In this way, we are able to tackle the more general problem of assessing the missingness mechanism by conducting a sensitivity analysis on the different alternatives to model the non-observed covariates. Finally, we illustrate the proposed approach with two examples on modeling health risk factors and disease mapping. Here, we rely on two different imputation mechanisms based on a typical multiple linear regression and a spatial model, respectively. Given the speed of model fitting with INLA we are able to fit joint models in a short time, and to easily conduct sensitivity analyses.

[1]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[2]  Andrea Riebler,et al.  Integrated Nested Laplace Approximations (INLA) , 2019, Wiley StatsRef: Statistics Reference Online.

[3]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[4]  Terri D Pigott Missing Data Analysis in Practice, by Trivellore Raghunathan , 2016, Journal of biopharmaceutical statistics.

[5]  V. Gómez‐Rubio Bayesian Inference with INLA , 2020 .

[6]  S. Richardson,et al.  Strategy for Modelling Nonrandom Missing Data Mechanisms in Observational Studies Using Bayesian Methods , 2012 .

[7]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[8]  Haavard Rue,et al.  Bayesian Computing with INLA: A Review , 2016, 1604.00860.

[9]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[10]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[11]  Therese D. Pigott,et al.  A Review of Methods for Missing Data , 2001 .

[12]  S. Baker The Multinomial‐Poisson Transformation , 1994 .

[13]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[14]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[15]  James R Carpenter,et al.  Sensitivity analysis after multiple imputation under missing at random: a weighting approach , 2007, Statistical methods in medical research.

[16]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[17]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[18]  M. Cameletti,et al.  Spatial and Spatio-temporal Bayesian Models with R - INLA , 2015 .

[19]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[20]  A. Gelfand,et al.  Handbook of spatial statistics , 2010 .

[21]  I. White,et al.  Review of inverse probability weighting for dealing with missing data , 2013, Statistical methods in medical research.

[22]  A. Mason,et al.  Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies , 2009 .

[23]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[24]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[25]  Dimitris Rizopoulos,et al.  Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach , 2016, Statistics in medicine.

[26]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[27]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[28]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[29]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[30]  Haavard Rue,et al.  Bayesian Model Averaging with the Integrated Nested Laplace Approximation , 2019, Econometrics.

[31]  Virgilio Gómez-Rubio,et al.  Markov chain Monte Carlo with the Integrated Nested Laplace Approximation , 2017, Stat. Comput..