This paper compares three approaches to the problem of selecting among probability models to fit data: (1) use of statistical criteria such as Akaike's information criterion and Schwarz's "Bayesian information criterion," (2) maximization of the posterior probability of the model, and (3) maximization of an "effectiveness ratio" trading off accuracy and computational cost. The unifying characteristic of the approaches is that all can be viewed as maximizing a penalized likelihood function. The second approach with suitable prior distributions has been shown to reduce to the first. This paper shows that the third approach reduces to the second for a particular form of the effectiveness ratio, and illustrates all three approaches with the problem of selecting the number of components in a mixture of Gaussian distributions. Unlike the first two approaches, the third can be used even when the candidate models are chosen for computational efficiency, without regard to physical interpretation, so that the likelihoods and the prior distribution over models cannot be interpreted literally. As the most general and computationally oriented of the approaches, it is especially useful for artificial intelligence applications.
[1]
David Draper,et al.
Assessment and Propagation of Model Uncertainty
,
2011
.
[2]
M. Stone.
An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion
,
1977
.
[3]
John E. Shore,et al.
Relative Entropy, Probabilistic Inference, and AI
,
1985,
UAI.
[4]
Herbert A. Simon,et al.
Optimal Problem-Solving Search: All-Oor-None Solutions
,
1975,
Artif. Intell..
[5]
A. F. Smith,et al.
Statistical analysis of finite mixture distributions
,
1986
.
[6]
Adrian E. Raftery,et al.
Bayes factors and model uncertainty
,
1995
.
[7]
Lennart Ljung,et al.
System Identification: Theory for the User
,
1987
.
[8]
James Kelly,et al.
AutoClass: A Bayesian Classification System
,
1993,
ML.
[9]
S. Sclove.
Small-sample and large-sample statistical model selection criteria
,
1994
.
[10]
Solomon Kullback,et al.
Information Theory and Statistics
,
1970,
The Mathematical Gazette.
[11]
Ross D. Shachter,et al.
Mixtures of Gaussians and Minimum Relative Entropy Techniques for Modeling Continuous Uncertainties
,
1993,
UAI.
[12]
James D. Hamilton.
A Quasi-Bayesian Approach to Estimating Parameters for Mixtures of Normal Distributions
,
1991
.