Learning Bayesian network structure: Towards the essential graph by integer linear programming tools

The basic idea of the geometric approach to learning a Bayesian network (BN) structure is to represent every BN structure by a certain vector. If the vector representative is chosen properly, it allows one to re-formulate the task of finding the global maximum of a score over BN structures as an integer linear programming (ILP) problem. Such a suitable zero-one vector representative is the characteristic imset, introduced by Studeny, Hemmecke and Lindner in 2010, in the proceedings of the 5th PGM workshop. In this paper, extensions of characteristic imsets are considered which additionally encode chain graphs without flags equivalent to acyclic directed graphs. The main contribution is a polyhedral description of the respective domain of the ILP problem, that is, by means of a set of linear inequalities. This theoretical result opens the way to the application of ILP software packages. The advantage of our approach is that, as a by-product of the ILP optimization procedure, one may get the essential graph, which is a traditional graphical BN representative. We also describe some computational experiments based on this idea.

[1]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[2]  Qiang Ji,et al.  Efficient Structure Learning of Bayesian Networks using Constraints , 2011, J. Mach. Learn. Res..

[3]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[4]  C.J.H. Mann,et al.  Probabilistic Conditional Independence Structures , 2005 .

[5]  M. Studen,et al.  Integer Linear Programming Approach to Learning Bayesian Network Structure: towards the Essential Graph , 2012 .

[6]  Michael I. Jordan Graphical Models , 1998 .

[7]  M. Studen,et al.  Characteristic imset: a simple algebraic representative of a Bayesian network structure , 2010 .

[8]  Andrés R. Masegosa,et al.  Locally averaged Bayesian Dirichlet metrics for learning the structure and the parameters of Bayesian networks , 2013, Int. J. Approx. Reason..

[9]  Milan Studeny Probabilistic Conditional Independence Structures: With 42 Illustrations (Information Science and Statistics) , 2004 .

[10]  Qiang Ji,et al.  Structure learning of Bayesian networks using constraints , 2009, ICML '09.

[11]  Silvia Lindner,et al.  Discrete optimisation in machine learning: learning of Bayesian network structures and conditional independence implication , 2012 .

[12]  Milan Studený,et al.  A recovery algorithm for chain graphs , 1997, Int. J. Approx. Reason..

[13]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[14]  Milan Studený,et al.  Polyhedral Approach to Statistical Learning Graphical Models , 2012 .

[15]  Milan Studený,et al.  On Polyhedral Approximations of Polytopes for Learning Bayesian Networks , 2013 .

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[17]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[18]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[19]  A. Land,et al.  Computer Codes for Problems of Integer Programming , 1979 .

[20]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[21]  Milan Studený,et al.  Characteristic imsets for learning Bayesian network structure , 2012, Int. J. Approx. Reason..

[22]  Milan Studený Characterization Of Essential Graphs By Means Of The Operation Of Legal Merging Of Components , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  James Cussens,et al.  Maximum likelihood pedigree reconstruction using integer programming , 2010, WCB@ICLP.

[24]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[25]  James Cussens,et al.  Maximum Likelihood Pedigree Reconstruction Using Integer Linear Programming , 2013, Genetic epidemiology.

[26]  Thorsten Koch,et al.  Branching rules revisited , 2005, Oper. Res. Lett..

[27]  Martin W. P. Savelsbergh,et al.  A Computational Study of Search Strategies for Mixed Integer Programming , 1999, INFORMS J. Comput..

[28]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[29]  M. Studený LP relaxations and pruning for characteristic imsets , 2012 .

[30]  Jirí Vomlel,et al.  A geometric view on learning Bayesian network structures , 2010, Int. J. Approx. Reason..