Inference for Singly Imputed Synthetic Data Based on Posterior Predictive Sampling under Multivariate Normal and Multiple Linear Regression Models

Likelihood-based finite sample inference for singly imputed synthetic data generated via posterior predictive sampling is developed in this paper for multivariate normal and multiple linear regression models. Currently available methodology for drawing valid inference on population parameters using synthetic data is based on concepts of multiple imputation for missing data, and therefore requires the release of multiple synthetic datasets. The methodology developed in this paper demonstrates that, contrary to the usual belief, valid inference about meaningful model parameters can indeed be drawn based on a singly imputed synthetic dataset under the multivariate normal and multiple linear regression models, by fully utilizing the model structure.