Learning Bayesian Networks with Local Structure

In this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method. This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.

[1]  M. Degroot Optimal Statistical Decisions , 1970 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[7]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[8]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[9]  Wray L. Buntine,et al.  A theory of learning classification rules , 1990 .

[10]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Wray L. BuntineRIACS Theory Reenement on Bayesian Networks , 1991 .

[13]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[16]  Sampath Srinivas,et al.  A Generalization of the Noisy-Or Model , 1993, UAI.

[17]  Francisco Javier Díez,et al.  Parameter adjustment in Bayes networks. The generalized noisy OR-gate , 1993, UAI.

[18]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[19]  David Heckerman,et al.  A New Look at Causal Independence , 1994, UAI.

[20]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[21]  Jr. Charles Ronald Musick Belief network induction , 1994 .

[22]  Stuart J. Russell,et al.  Local Learning in Probabilistic Networks with Hidden Variables , 1995, IJCAI.

[23]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[24]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[25]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[26]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[27]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.