Bayesian Network structure learning: Hybridizing complete search with independence tests

Bayesian Networks (BN) are probabilistic graphical models used to encode in a compact way a joint probability distribution over a set of random variables. The NP-complete problem of finding the most probable BN structure given the observed data has been largely studied in recent years. In the literature, several complete algorithms have been proposed for the problem; in parallel, several tests for statistical independence between the random variables have been proposed, in order to reduce the size of the search space. In this work, we study how to hybridize the algorithm representing the state-of-the-art in complete search with two types of independence tests, and assess the performance of the two hybrid algorithms in terms of both solution quality and computational time. Experimental results show that hybridization with both types of independence test results in a substantial gain in computational time, against a limited loss in solution quality, and allow us to provide some guidelines on the choice of the test type, given the number of nodes in the network and the sample size.

[1]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[2]  P. Sebastiani,et al.  Bayesian Networks for Genomic Analysis , 2004 .

[3]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[4]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[5]  Qiang Ji,et al.  Efficient Structure Learning of Bayesian Networks using Constraints , 2011, J. Mach. Learn. Res..

[6]  Stefan Szeider,et al.  Algorithms and Complexity Results for Exact Bayesian Structure Learning , 2010, UAI.

[7]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[8]  S. Miyano,et al.  Finding Optimal Bayesian Network Given a Super-Structure , 2008 .

[9]  Giorgos Borboudakis,et al.  Permutation Testing Improves Bayesian Network Learning , 2010, ECML/PKDD.

[10]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[11]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[14]  H. Akaike A new look at the statistical model identification , 1974 .

[15]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[16]  Marek J. Druzdzel,et al.  A comparison of structural distance measures for causal Bayesian network models , 2009 .

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[18]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[19]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[20]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[21]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[22]  Little,et al.  [Lecture Notes in Mathematics] Combinatorial Mathematics V Volume 622 || Counting unlabeled acyclic digraphs , 1977 .