The Estimation of Distributions and the Minimum Relative Entropy Principle

Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper we explain the relationship of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics. The major design issues are discussed within a general interdisciplinary framework. It is shown that maximum entropy approximations play a crucial role. All proposed algorithms try to minimize the Kullback-Leibler divergence KLD between the unknown distribution p(x) and a class q(x) of approximations. However, the Kullback-Leibler divergence is not symmetric. Approximations which suppose that the function to be optimized is additively decomposed (ADF) minimize KLD(q||p), the methods which learn the approximate model from data minimize KLD(p||q). This minimization is identical to maximizing the log-likelihood. In the paper three classes of algorithms are discussed. FDA uses the ADF to compute an approximate factorization of the unknown distribution. The factors are marginal distributions, whose values are computed from samples. The second class is represented by the Bethe-Kikuchi approach which has recently been rediscovered in statistical physics. Here the values of the marginals are computed from a difficult constrained minimization problem. The third class learns the factorization from the data. We analyze our learning algorithm LFDA in detail. It is shown that learning is faced with two problems: first, to detect the important dependencies between the variables, and second, to create an acyclic Bayesian network of bounded clique size.

[1]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[2]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[3]  I. Good,et al.  The Maximum Entropy Formalism. , 1979 .

[4]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[5]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[6]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[7]  Heinz Mühlenbein,et al.  FDA -A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions , 1999, Evolutionary Computation.

[8]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[9]  Michael I. Jordan Graphical Models , 2003 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[12]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  Frank Jensen,et al.  Optimal junction Trees , 1994, UAI.

[15]  Thomas Uthmann,et al.  Self-Organized Modularization in Evolutionary Algorithms , 2005, Evolutionary Computation.

[16]  Yang Xiang,et al.  A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks , 2004, Machine Learning.

[17]  Russell G. Almond Graphical belief modeling , 1995 .

[18]  Heinz Mühlenbein,et al.  Evolutionary optimization and the estimation of search distributions with applications to graph bipartitioning , 2002, Int. J. Approx. Reason..

[19]  U. Montanari,et al.  Nonserial Dynamic Programming: On the Optimal Strategy of Variable Elimination for the Rectangular Lattice , 1972 .

[20]  H. Bethe Statistical Theory of Superlattices , 1935 .

[21]  S. Kullback Probability Densities with Given Marginals , 1968 .

[22]  T. Mahnig,et al.  Evolutionary algorithms: from recombination to search distributions , 2001 .

[23]  R. Kikuchi A Theory of Cooperative Phenomena , 1951 .

[24]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[25]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[26]  T. Mahnig,et al.  Mathematical Analysis of Evolutionary Algorithms , 2002 .

[27]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[28]  Heinz Mühlenbein,et al.  Evolutionary Algorithms and the Boltzmann Distribution , 2002, FOGA.

[29]  Heinz Mühlenbein,et al.  A Maximum Entropy Approach to Sampling in EDA ? The Single Connected Case , 2003, CIARP.