A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks

Several scoring metrics are used in different search procedures for learning probabilistic networks. We study the properties of cross entropy in learning a decomposable Markov network. Though entropy and related scoring metrics were widely used, its “microscopic” properties and asymptotic behavior in a search have not been analyzed. We present such a “microscopic” study of a minimum entropy search algorithm, and show that it learns an I-map of the domain model when the data size is large.Search procedures that modify a network structure one link at a time have been commonly used for efficiency. Our study indicates that a class of domain models cannot be learned by such procedures. This suggests that prior knowledge about the problem domain together with a multi-link search strategy would provide an effective way to uncover many domain models.

[1]  Gregory F. Cooper,et al.  An Entropy-driven System for Construction of Probabilistic Expert Systems from Databases , 1990, UAI.

[2]  Wray L. BuntineRIACS Theory Reenement on Bayesian Networks , 1991 .

[3]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[4]  R. Möhring Algorithmic graph theory and perfect graphs , 1986 .

[5]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[6]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[7]  Yang Xiang Distributed Multi-Agent Probabilistic Reasoning With Bayesian Networks , 1994, ISMIS.

[8]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[9]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[10]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[11]  M. Frydenberg,et al.  Decomposition of maximum likelihood in mixed graphical interaction models , 1989 .

[12]  Yang Xiang,et al.  A Method for Implementing a Probabilistic Model as a Relational Database , 1995, UAI.

[13]  Udi Manber,et al.  Introduction to algorithms - a creative approach , 1989 .

[14]  Walter L. Smith Probability and Statistics , 1959, Nature.

[15]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[16]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[17]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[18]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[19]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[20]  Yang Xiang,et al.  CONSTRUCTION OF A MARKOV NETWORK FROM DATA FOR PROBABILISTIC INFERENCE , 1994 .

[21]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[22]  Remco R. Bouckaert,et al.  Properties of Bayesian Belief Network Learning Algorithms , 1994, UAI.

[23]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[24]  David Poole,et al.  MULTIPLY SECTIONED BAYESIAN NETWORKS AND JUNCTION FORESTS FOR LARGE KNOWLEDGE‐BASED SYSTEMS , 1993, Comput. Intell..

[25]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[26]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[27]  Stuart L. Crawford,et al.  Constructor: A System for the Induction of Probabilistic Models , 1990, AAAI.

[28]  Judea Pearl,et al.  The recovery of causal poly-trees from statistical data , 1987, Int. J. Approx. Reason..

[29]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[30]  Wray L. Buntine Classifiers: A Theoretical and Empirical Study , 1991, IJCAI.

[31]  Petr Hájek,et al.  Uncertain information processing in expert systems , 1992 .

[32]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[33]  Yang Xiang,et al.  Representation of Bayesian Networks as Relational Databases , 1994, IPMU.

[34]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[35]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[36]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[37]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[38]  Yang Xiang,et al.  A Probabilistic Framework for Cooperative Multi-Agent Distributed Interpretation and Optimization of Communication , 1996, Artif. Intell..

[39]  Yang Xiang,et al.  Multiply sectioned Bayesian networks for neuromuscular diagnosis , 1993, Artif. Intell. Medicine.

[40]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[41]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[42]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[43]  S. Sclove Small-sample and large-sample statistical model selection criteria , 1994 .

[44]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[45]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[46]  D. Edwards,et al.  A fast procedure for model search in multidimensional contingency tables , 1985 .