Comparing Prequential Model Selection Criteria in Supervised Learning of Mixture Models

In this paper we study prequential model selection criteria in supervised learning domains. The main problem with this approach is the fact that the criterion is sensitive to the ordering the data is processed with. We discuss several approaches for addressing the ordering problem, and compare empirically their performance in real-world supervised model selection tasks. The empirical results demonstrate that with the prequential approach it is quite easy to nd predictive models that are signiicantly more accurate clas-siiers than the models found by the standard unsupervised marginal likelihood criterion. The results also suggest that averaging over random orderings may be a more sensible strategy for solving the ordering problem than trying to nd the ordering optimizing the prequential model selection criterion.

[1]  A. Dawid Fisherian Inference in Likelihood and Prequential Frames of Reference , 1991 .

[2]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  David Heckerman,et al.  Models and Selection Criteria for Regression and Classification , 1997, UAI.

[5]  Henry Tirri,et al.  Exploring the robustness of Bayesian and information-theoretic methods for predictive inference , 1999, AISTATS.

[6]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[7]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[8]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[9]  Henry Tirri,et al.  On predictive distributions and Bayesian networks , 2000, Stat. Comput..

[10]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[11]  Robert G. Cowell On Compatible Priors for Bayesian Networks , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Henry Tirri,et al.  On Supervised Selection of Bayesian Networks , 1999, UAI.

[13]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[14]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[16]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[17]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[18]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  Henry Tirri,et al.  Minimum Encoding Approaches for Predictive Modeling , 1998, UAI.

[20]  David Heckerman Likelihoods and Parameter Priors for Bayesian Networks , 1995 .

[21]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.