Learning with Bayesian networks and probability trees to approximate a joint distribution

Most of learning algorithms with Bayesian networks try to minimize the number of structural errors (missing, added or inverted links in the learned graph with respect to the true one). In this paper we assume that the objective of the learning task is to approximate the joint probability distribution of the data. For this aim, some experiments have shown that learning with probability trees to represent the conditional probability distributions of each node given its parents provides better results that learning with probability tables. When approximating a joint distribution structure and parameter learning can not be seen as separated tasks and we have to evaluate the performance of combinations of procedures for inducing both structure and parameters. We carry out an experimental evaluation of several combined strategies based on trees and tables using a greedy hill climbing algorithm and compare the results with a restricted search procedure (the Max-Min hill climbing algorithm).

[1]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[2]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[3]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[4]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[5]  Jose Miguel Puerta,et al.  Ant colony optimization for learning Bayesian networks , 2002, Int. J. Approx. Reason..

[6]  Manfred Jaeger,et al.  Learning probabilistic decision graphs , 2006, Int. J. Approx. Reason..

[7]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[8]  James G. Scott,et al.  Bayesian Adjustment for Multiplicity , 2009 .

[9]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[10]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[11]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[12]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  A. Salmerón,et al.  Importance sampling in Bayesian networks using probability trees , 2000 .

[15]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.