On Inclusion-Driven Learning of Bayesian Networks

Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the so-called "inclusion order." This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and real-world data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.

[1]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[2]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[3]  Robert Castelo,et al.  Improved learning of Bayesian networks , 2001, UAI.

[4]  M. Frydenberg The chain graph Markov property , 1990 .

[5]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[6]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[7]  T. Havránek A Procedure for Model Search in Multidimensional Contingency Tables , 1984 .

[8]  Milan Studený,et al.  On characterizing Inclusion of Bayesian Networks , 2001, UAI.

[9]  David Madigan,et al.  On the relation between conditional independence models determined by finite distributive lattices and by directed acyclic graphs , 1995 .

[10]  D. Madigan,et al.  Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs , 1996 .

[11]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[12]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[13]  Remco R. Bouckaert,et al.  Optimizing Causal Orderings for Generating DAGs from Data , 1992, UAI.

[14]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[15]  Judea Pearl,et al.  The Logic of Representing Dependencies by Directed Graphs , 1987, AAAI.

[16]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[17]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[18]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[19]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[20]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[21]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[22]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[23]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[24]  Michael D. Perlman,et al.  Enumerating Markov Equivalence Classes of Acyclic Digraph Models , 2001, UAI.

[25]  A. J. Feelders,et al.  MAMBO: Discovering Association Rules Based on Conditional Independencies , 2001, IDA.

[26]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[27]  Edward H. Herskovits,et al.  Computer-based probabilistic-network construction , 1992 .

[28]  Fabio Gagliardi Cozman,et al.  Random Generation of Bayesian Networks , 2002, SBIA.

[29]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[30]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[31]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[32]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[33]  Michel Mouchart,et al.  Discussion on "Conditional independence in statistitical theory" by A.P. Dawid , 1979 .

[34]  Paolo Giudici,et al.  Association Models for Web Mining , 2004, Data Mining and Knowledge Discovery.

[35]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[36]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[37]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[38]  Michael I. Jordan Graphical Models , 1998 .

[39]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[40]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .

[41]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[42]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[43]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[44]  D. Madigan,et al.  On the Markov Equivalence of Chain Graphs, Undirected Graphs, and Acyclic Digraphs , 1997 .

[45]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[46]  Robert Castelo,et al.  Learning Essential Graph Markov Models From Data , 2002, Probabilistic Graphical Models.

[47]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[48]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[49]  A. P. Dawid,et al.  Independence properties of directed Markov fields. Networks, 20, 491-505 , 1990 .

[50]  G. Melançon,et al.  Random generation of dags for graph drawing , 2000 .

[51]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.