Efficient learning of hierarchical latent class models

Hierarchical latent class (HLC) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are hidden. In earlier work, we have demonstrated in principle the possibility of reconstructing HLC models from data. We address the scalability issue and develop a search-based algorithm that can efficiently learn high-quality HLC models for realistic domains. There are three technical contributions: (1) the identification of a set of search operators; (2) the use of improvement in BIC score per unit of increase in model complexity, rather than BIC score itself, for model selection; and (3) the adaptation of structural EM for situations where candidate models contain different variables than the current model. The algorithm was tested on the COIL Challenge 2000 data set and an interesting model was found.