In this paper, we derive optimality results for greedy Bayesian-network search algorithms that perform single-edge modifications at each step and use asymptotically consistent scoring criteria. Our results extend those of Meek (1997) and Chickering (2002), who demonstrate that in the limit of large datasets, if the generative distribution is perfect with respect to a DAG defined over the observable variables, such search algorithms will identify this optimal (i.e. generative) DAG model. We relax their assumption about the generative distribution, and assume only that this distribution satisfies the composition property over the observable variables, which is a more realistic assumption for real domains. Under this assumption, we guarantee that the search algorithms identify an inclusion-optimal model; that is, a model that (1) contains the generative distribution and (2) has no sub-model that contains this distribution. In addition, we show that the composition property is guaranteed to hold whenever the dependence relationships in the generative distribution can be characterized by paths between singleton elements in some generative graphical model (e.g. a DAG, a chain graph, or a Markov network) even when the generative model includes unobserved variables, and even when the observed data is subject to selection bias.
[1]
D. Haughton.
On the Choice of a Model to Fit Data from an Exponential Family
,
1988
.
[2]
Judea Pearl,et al.
Probabilistic reasoning in intelligent systems - networks of plausible inference
,
1991,
Morgan Kaufmann series in representation and reasoning.
[3]
David Maxwell Chickering,et al.
Learning Bayesian Networks is NP-Complete
,
2016,
AISTATS.
[4]
David Maxwell Chickering,et al.
A Transformational Characterization of Equivalent Bayesian Network Structures
,
1995,
UAI.
[5]
Sanjoy Dasgupta,et al.
Learning Polytrees
,
1999,
UAI.
[6]
Christopher Meek,et al.
Finding a path is harder than finding a tree
,
2001,
AISTATS.
[7]
D. Geiger,et al.
Stratified exponential families: Graphical models and model selection
,
2001
.
[8]
David Maxwell Chickering,et al.
Optimal Structure Identification With Greedy Search
,
2003,
J. Mach. Learn. Res..
[9]
David Maxwell Chickering,et al.
The Road to Asymptopia
,
2002
.
[10]
David Maxwell Chickering,et al.
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data
,
1994,
Machine Learning.