Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning

We show how a conceptually simple search operator called Optimal Reinsertion can be applied to learning Bayesian Network structure from data. On each step we pick a node called the target. We delete all arcs entering or exiting the target. We then find, subject to some constraints, the globally optimal combination of in-arcs and out-arcs with which to reinsert it. The heart of the paper is a new algorithm called ORSearch which allows each optimal reinsertion step to be computed efficiently on large datasets. Our empirical results compare Optimal Reinsertion against a highly tuned implementation of multirestart hill climbing. The results typically show one to two orders of magnitude speed-up on a variety of datasets. They usually show better final results, both in terms of BDEU score and in modeling of future data drawn from the same distribution.

[1]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[2]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[3]  R. Nichol,et al.  The Edinburgh/Durham Southern Galaxy Catalogue , 1992 .

[4]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[5]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[6]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[7]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[8]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[9]  Nir Friedman,et al.  Sequential Update of Bayesian Network Structure , 1997, UAI.

[10]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[11]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[14]  R. Nichol,et al.  The Edinburgh/Durham Southern Galaxy Catalogue - IX. The Galaxy Catalogue , 2000, astro-ph/0008184.

[15]  Andrew W. Moore,et al.  Using Tarjan's Red Rule for Fast Dependency Tree Construction , 2002, NIPS.

[16]  Dale Schuurmans,et al.  Data perturbation for escaping local maxima in learning , 2002, AAAI/IAAI.

[17]  Andrew W. Moore,et al.  Real-valued All-Dimensions Search: Low-overhead Rapid Searching over Subsets of Attributes , 2002, UAI.

[18]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[19]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[20]  Yang Xiang,et al.  A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks , 2004, Machine Learning.

[21]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.