Given a set of possible models for variables X and a set of possible parameters for each model, the Bayesian “estimate” of the probability distribution for X given observed data is obtained by averaging over the possible models and their parameters. An often-used approximation for this estimate is obtained by selecting a single model and averaging over its parameters. The approximation is useful because it is computationally efficient, and because it provides a model that facilitates understanding of the domain. A common criterion for model selection is the posterior probability of the model. Another criterion for model selection, proposed by San Martini and Spezzafari (1984), is the predictive performance of a model for the next observation to be seen. From the standpoint of domain understanding, both criteria are useful, because one identifies the model that is most likely, whereas the other identifies the model that is the best predictor of the next observation. To highlight the difference, we refer to the posterior-probability and alternative criteria as the scientific criterion (SC) and engineering criterion (EC), respectively. When we are interested in predicting the next observation, the model-averaged estimate is at least as good as that produced by EC, which itself is at least as good as the estimate produced by SC. We show experimentally that, for Bayesian-network models containing discrete variables only, the predictive performance of the model average can be significantly better than those of single models selected by either criterion, and that differences between models selected by the two criterion can be substantial.
[1]
G. Chow.
A comparison of the information and posterior probability criteria for model selection
,
1981
.
[2]
W. H. Sewell,et al.
Social Class, Parental Encouragement, and Educational Aspirations
,
1968,
American Journal of Sociology.
[3]
Gregory F. Cooper,et al.
A Bayesian Method for the Induction of Probabilistic Networks from Data
,
1992
.
[4]
H. Akaike,et al.
Information Theory and an Extension of the Maximum Likelihood Principle
,
1973
.
[5]
G. Schwarz.
Estimating the Dimension of a Model
,
1978
.
[6]
A. P. Dawid,et al.
Present position and potential developments: some personal views
,
1984
.
[7]
P. Spirtes,et al.
Causation, prediction, and search
,
1993
.
[8]
J. Bernardo.
Expected Information as Expected Utility
,
1979
.
[9]
David Draper,et al.
Assessment and Propagation of Model Uncertainty
,
2011
.
[10]
W. Hays.
Statistical theory.
,
1968,
Annual review of psychology.
[11]
J. Pearl.
Causal diagrams for empirical research
,
1995
.
[12]
Fulvio Spezzaferri,et al.
A Predictive Model Selection Criterion
,
1984
.