Optimal Structure Identification With Greedy Search

In this paper we prove the so-called "Meek Conjecture". In particular, we show that if a DAG H is an independence map of another DAG G, then there exists a finite sequence of edge additions and covered edge reversals in G such that (1) after each edge modification H remains an independence map of G and (2) after all modifications G =H. As shown by Meek (1997), this result has an important consequence for Bayesian approaches to learning Bayesian networks from data: in the limit of large sample size, there exists a two-phase greedy search algorithm that---when applied to a particular sparsely-connected search space---provably identifies a perfect map of the generative distribution if that perfect map is a DAG. We provide a new implementation of the search space, using equivalence classes as states, for which all operators used in the greedy search can be scored efficiently using local functions of the nodes in the domain. Finally, using both synthetic and real-world datasets, we demonstrate that the two-phase greedy approach leads to good solutions when learning with finite sample sizes.

[1]  L. M. M.-T. Theory of Probability , 1929, Nature.

[2]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[5]  M. Tarsi,et al.  A simple algorithm to construct a consistent extension of a partially oriented graph , 1992 .

[6]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[7]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[8]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[9]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[10]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[11]  D. Madigan,et al.  A characterization of Markov equivalence classes for acyclic digraphs , 1997 .

[12]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[13]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[14]  Michael D. Perlman,et al.  Enumerating Markov Equivalence Classes of Acyclic Digraph Models , 2001, UAI.

[15]  Milan Studený,et al.  On characterizing Inclusion of Bayesian Networks , 2001, UAI.

[16]  D. Geiger,et al.  Stratified exponential families: Graphical models and model selection , 2001 .

[17]  David Maxwell Chickering,et al.  Finding Optimal Bayesian Networks , 2002, UAI.

[18]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[19]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[20]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.