Exploring the robustness of Bayesian and information-theoretic methods for predictive inference

Given a set of sample data, we study three alternative methods for determining the pre-dictive distribution of an unseen data vector. In particular, we are interested in the behavior of the predictive accuracy of these three predictive methods as a function of the degree of the domain assumption violations. We explore this question empirically by using artiicially generated data sets, where the assumptions can be violated in various ways. Our empirical results suggest that if the model assumptions are only mildly violated , marginalization over the model parameters may not be necessary in practice. This is due to the fact that in this case the computationally much simpler predictive distribution based on a single, maximum posterior probability model shows similar performance as the computationally more demanding marginal likelihood approach. The results also give support to Rissanen's theoretical results about the usefulness of using Jeereys' prior distribution for the model parameters.