On Supervised Selection of Bayesian Networks

Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more "focused" predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classification data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawid's prequential (predictive sequential) principle. The results demonstrate that the marginal likelihood score does not perform well for supervised model selection, while the best results are obtained by using Dawid's prequential approach.

[1]  Henry Tirri,et al.  On predictive distributions and Bayesian networks , 2000, Stat. Comput..

[2]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[3]  A. Dawid Fisherian Inference in Likelihood and Prequential Frames of Reference , 1991 .

[4]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[5]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[6]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[8]  Henry Tirri,et al.  BAYDA: Software for Bayesian Classification and Feature Selection , 1998, KDD.

[9]  David Heckerman,et al.  Models and Selection Criteria for Regression and Classification , 1997, UAI.

[10]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine-mediated learning.

[11]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[12]  Robert G. Cowell On Compatible Priors for Bayesian Networks , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  David Heckerman Likelihoods and Parameter Priors for Bayesian Networks , 1995 .

[14]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[17]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[18]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[19]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  James O. Berger Statistical Decision Theory , 1980 .

[22]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[23]  Henry Tirri,et al.  Predictive Data Mining with Finite Mixtures , 1996, KDD.

[24]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[25]  Henry Tirri,et al.  Minimum Encoding Approaches for Predictive Modeling , 1998, UAI.