Learning Bayesian Belief Networks Based on the MDL Principle : An Efficient Algorithm Using the Branch and Bound Technique

In this paper, the computational issue in the problem of learning Bayesian belief networks (BBNs) based on the minimum description length (MDL) principle is addressed. Based on an asymptotic formula of description length, we apply the branch and bound technique to finding true network structures. The resulting algorithm searches considerably saves the computation yet successfully searches the network structure with the minimum value of the formula. Thus far, there has been no search algorithm that finds the optimal solution for examples of practical size and a set of network structures in the sense of the maximum posterior probability, and heuristic searches such as K2 and K3 trap in local optima due to the greedy nature even when the sample size is large. The proposed algorithm, since it minimizes the description length, eventually selects the true network structure as the sample size goes to infinity. key words: Bayesian belief networks, minimum description length (MDL) principle, branch and bound technique, Cooper and Herskovits procedure, MDL-based procedure, K2, K3

[1]  G. T. McKee,et al.  Evolving knowledge in natural science and artificial intelligence , 1990 .

[2]  P. Spirtes,et al.  Causality From Probability , 1989 .

[3]  R. Bouckaert Minimum Description Length Principle , 1994 .

[4]  A. Barron,et al.  Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .

[5]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[6]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[7]  Joe Suzuki,et al.  A Construction of Bayesian Networks from Databases Based on an MDL Principle , 1993, UAI.

[8]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[9]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[10]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[11]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[12]  Kenji Yamanishi,et al.  A learning criterion for stochastic rules , 1990, COLT '90.

[13]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[14]  H. Akaike A new look at the statistical model identification , 1974 .

[15]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[16]  Richard E. Blahut,et al.  Principles and practice of information theory , 1987 .

[17]  R. Shibata Selection of the order of an autoregressive model by Akaike's information criterion , 1976 .

[18]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[19]  Wai Lam,et al.  Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.