Learning Bayesian Networks under Equivalence Constraints (Abstract)

We propose here an approach for learning parameters in Bayesian networks from incomplete datasets that are subject to equivalence constraints. These equivalence constraints arise from datasets where examples are tied together, in that we may not know the value of a particular variable, but whatever that value is, we know it must be the same across different examples. We formalize the problem by defining the notion of a constrained dataset — a dataset with equivalence constraints — and a corresponding constrained likelihood that we seek to optimize. We derive an EM algorithm to estimate parameters from constrained datasets, which reduces to the vanilla EM algorithm when estimating parameters from traditional datasets. Finally, we evaluate our general approach in clustering problems from semi-supervised learning, showing that it is competitive with more specialized approaches.