An Information-Geometric Approach to Learning Bayesian Network Topologies from Data

This work provides a general overview of structure learning of Bayesian networks (BNs), and goes on to explore the feasibility of applying an information-geometric approach to the task of learning the topology of a BN from data. An information-geometric scoring function based on the Minimum Description Length Principle is described. The info-geometric score takes into account the effects of complexity due to both the number of parameters in the BN, and the geometry of the statistical manifold on which the parametric family of probability distributions of the BN is mapped. The paper provides an introduction to information geometry, and lays out a theoretical framework supported by empirical evidence that shows that this info-geometric scoring function is at least as efficient as applying BIC (Bayesian information criterion); and that, for certain BN topologies, it can drastically increase the accuracy in the selection of the best possible BN.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Carlos C. Rodriguez,et al.  The Volume of Bitnets , 2004 .

[5]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[6]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[7]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[8]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[9]  Carlos C. Rodriguez,et al.  The Metrics Induced by the Kullback Number , 1989 .

[10]  Gregory J. Chaitin,et al.  A recent technical report , 1974, SIGA.

[11]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[12]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[13]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[14]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[15]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[16]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[17]  John Skilling,et al.  Maximum Entropy and Bayesian Methods , 1989 .

[18]  J. Nash The imbedding problem for Riemannian manifolds , 1956 .

[19]  Vijay Balasubramanian,et al.  A Geometric Formulation of Occam's Razor For Inference of Parametric Distributions , 1996, adap-org/9601001.

[20]  C. Rodr Entropic Priors , 1991 .

[21]  Ming Li,et al.  Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  C. C. Rodriguez Entropic priors for discrete probabilistic networks and for mixtures of Gaussians models , 2002, physics/0201016.

[24]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[25]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[26]  Eitel J. M. Lauría Learning the Structure of a Bayesian Network: An Application of Information Geometry and the Minimum Description Length Principle , 2005 .

[27]  R. D. Gibson Theory and problems of heat transfer, (schaum's outline series): McGraw-Hill, New York (1978) , 1979 .

[28]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[29]  H. Jeffreys A Treatise on Probability , 1922, Nature.

[30]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[31]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[32]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .