Interpolating Conditional Density Trees

Joint distributions over many variables are frequently modeled by decomposing them into products of simpler, lower-dimensional conditional distributions, such as in sparsely connected Bayesian networks. However, automatically learning such models can be very computationally expensive when there are many datapoints and many continuous variables with complex nonlinear relationships, particularly when no good ways of decomposing the joint distribution are known a priori. In such situations, previous research has generally focused on the use of discretization techniques in which each continuous variable has a single discretization that is used throughout the entire network. \ In this paper, we present and compare a wide variety of tree-based algorithms for learning and evaluating conditional density estimates over continuous variables. These trees can be thought of as discretizations that vary according to the particular interactions being modeled; however, the density within a given leaf of the tree need not be assumed constant, and we show that such nonuniform leaf densities lead to more accurate density estimation. We have developed Bayesian network structure-learning algorithms that employ these tree-based conditional density representations, and we show that they can be used to practically learn complex joint probability models over dozens of continuous variables from thousands of datapoints. We focus on finding models that are simultaneously accurate, fast to learn, and fast to evaluate once they are learned.

[1]  Volker Tresp,et al.  Discovering Structure in Continuous Variables Using Bayesian Networks , 1995, NIPS.

[2]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[3]  Gregory F. Cooper,et al.  A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data , 1998, UAI.

[4]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[5]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[6]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[7]  Daphne Koller,et al.  Nonuniform Dynamic Discretization in Hybrid Networks , 1997, UAI.

[8]  Gregory F. Cooper,et al.  A latent variable model for multivariate discretization , 1999, AISTATS.

[9]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[10]  Andrew W. Moore,et al.  Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables , 2000, UAI.

[11]  Gregory F. Cooper,et al.  Learning Hybrid Bayesian Networks from Data , 1999, Learning in Graphical Models.

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Darryl Morrell,et al.  Implementation of Continuous Bayesian Networks Using Sums of Weighted Gaussians , 1995, UAI.

[15]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[16]  Nir Friedman,et al.  Gaussian Process Networks , 2000, UAI.

[17]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[18]  Andrew W. Moore,et al.  Fast factored density estimation and compression with bayesian networks , 2002 .