Towards an inclusion driven learning of Bayesian Networks

Two or more Bayesian Networks are Markov equivalent when their corresponding acyclic digraphs encode the same set of conditional independence (= CI) restrictions. Therefore, the search space of Bayesian Networks may be organized in classes of equivalence, where each of them consists of a particular set of CI restrictions. The collection of sets of CI restrictions obeys a partial order, the graphical Markov model inclusion partial order, or inclusion order for short. This paper discusses in depth the role that inclusion order plays in learning the structure of Bayesian networks. We prove that under very special conditions the traditional hill-climber always recovers the right structure. Moreover, we extend the recent experimental results presented in (Kocka and Castelo, 2001). We show how learning algorithms for Bayesian Networks, that take the inclusion order into account, perform better than those that do not, and we introduce two new ones in the context of heuristic search and the MCMC method.

[1]  M. Frydenberg The chain graph Markov property , 1990 .

[2]  Robert Castelo,et al.  Improved learning of Bayesian networks , 2001, UAI.

[3]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[4]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[5]  A. P. Dawid,et al.  Independence properties of directed Markov fields. Networks, 20, 491-505 , 1990 .

[6]  G. Melançon,et al.  Random generation of dags for graph drawing , 2000 .

[7]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[8]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[9]  Michael D. Perlman,et al.  Enumerating Markov Equivalence Classes of Acyclic Digraph Models , 2001, UAI.

[10]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[11]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[12]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[13]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[14]  Remco R. Bouckaert,et al.  Optimizing Causal Orderings for Generating DAGs from Data , 1992, UAI.

[15]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[16]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[17]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[18]  V. Rich Personal communication , 1989, Nature.

[19]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[20]  Michael I. Jordan Graphical Models , 1998 .

[21]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[22]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[23]  D. Madigan,et al.  Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs , 1996 .

[24]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[25]  Judea Pearl,et al.  The Logic of Representing Dependencies by Directed Graphs , 1987, AAAI.

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Milan Studený,et al.  On characterizing Inclusion of Bayesian Networks , 2001, UAI.

[28]  David Draper,et al.  Assessment and Propagation of Model Uncertainty , 2011 .

[29]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[30]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[31]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[32]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[33]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[34]  Christopher Meek,et al.  Learning Bayesian Networks with Discrete Variables from Data , 1995, KDD.

[35]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[36]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[37]  Edward H. Herskovits,et al.  Computer-based probabilistic-network construction , 1992 .

[38]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[39]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.