Estimation of distribution algorithms and minimum relative entropy

In the field of optimization using probabilistic models of the search space, this thesis identifies and elaborates several advancements in which the principles of maximum entropy and minimum relative entropy from information theory are used to estimate a probability distribution. The probability distribution within the search space is represented by a graphical model (factorization, Bayesian network or junction tree). An estimation of distribution algorithm (EDA) is an evolutionary optimization algorithm which uses a graphical model to sample a population within the search space and then estimates a new graphical model from the selected individuals of the population. • So far, the Factorized Distribution Algorithm (FDA) builds a factorization or Bayesian network from a given additive structure of the objective function to be optimized using a greedy algorithm which only considers a subset of the variable dependencies. Important connections can be lost by this method. This thesis presents a heuristic subfunction merge algorithm which is able to consider all dependencies between the variables (as long as the marginal distributions of the model do not become too large). On a 2-D grid structure, this algorithm builds a pentavariate factorization which allows to solve the deceptive grid benchmark problem with a much smaller population size than the conventional factorization. Especially for small population sizes, calculating large marginal distributions from smaller ones using Maximum Entropy and iterative proportional fitting leads to a further improvement. • The second topic is the generalization of graphical models to loopy structures. Using the Bethe-Kikuchi approximation, the loopy graphical model (region graph) can learn the Boltzmann distribution of an objective function by a generalized belief propagation algorithm (GBP). It minimizes the free energy, a notion adopted from statistical physics which is equivalent to the relative entropy to the Boltzmann distribution. Previous attempts to combine the Kikuchi approximation with EDA have relied on an expensive Gibbs sampling procedure for generating a population from this loopy probabilistic model. In this thesis a combination with a factorization is presented which allows more efficient sampling. The free energy is generalized to incorporate the inverse temperature β. The factorization building algorithm mentioned above can be employed here, too.

[1]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[2]  Robert J. McEliece,et al.  Belief Propagation on Partially Ordered Sets , 2003, Mathematical Systems Theory in Biology, Communications, Computation, and Finance.

[3]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[4]  T. Mahnig,et al.  Evolutionary algorithms: from recombination to search distributions , 2001 .

[5]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[6]  Heinz Mühlenbein,et al.  FDA -A Scalable Evolutionary Algorithm for the Optimization of Additively Decomposed Functions , 1999, Evolutionary Computation.

[7]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[8]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[9]  Nils J. Nilsson,et al.  Probabilistic Logic * , 2022 .

[10]  Carl-Heinz Meyer Korrektes Schließen bei unvollständiger Information: Anwendung des Prinzips der maximalen Entropie in einem probabilistischen Expertensystem , 1998 .

[11]  D. E. Goldberg,et al.  Simple Genetic Algorithms and the Minimal, Deceptive Problem , 1987 .

[12]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[13]  H. Geiringer On the Probability Theory of Linkage in Mendelian Heredity , 1944 .

[14]  Nir Friedman,et al.  On the Sample Complexity of Learning Bayesian Networks , 1996, UAI.

[15]  Bruce Tidor,et al.  An Analysis of Selection Procedures with Particular Attention Paid to Proportional and Boltzmann Selection , 1993, International Conference on Genetic Algorithms.

[16]  Heinz Mühlenbein,et al.  The Estimation of Distributions and the Minimum Relative Entropy Principle , 2005, Evol. Comput..

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[18]  S. Kullback Probability Densities with Given Marginals , 1968 .

[19]  Elena Marchiori,et al.  Evolutionary Algorithms for the Satisfiability Problem , 2002, Evolutionary Computation.

[20]  Russell G. Almond Graphical belief modeling , 1995 .

[21]  David E. Goldberg,et al.  Hierarchical BOA Solves Ising Spin Glasses and MAXSAT , 2003, GECCO.

[22]  Heinz Mühlenbein,et al.  Stochastic Analysis of Cellular Automata and the Voter Model , 2002, ACRI.

[23]  Bart Naudts,et al.  Symmetry in the representation of an optimization problem , 2003 .

[24]  Heinz Mühlenbein,et al.  Evolutionary optimization and the estimation of search distributions with applications to graph bipartitioning , 2002, Int. J. Approx. Reason..

[25]  Alan L. Yuille,et al.  CCCP Algorithms to Minimize the Bethe and Kikuchi Free Energies: Convergent Alternatives to Belief Propagation , 2002, Neural Computation.

[26]  Peter Gr Unwald The minimum description length principle and reasoning under uncertainty , 1998 .

[27]  S. Aji,et al.  The Generalized Distributive Law and Free Energy Minimization , 2001 .

[28]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.

[29]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[30]  Helena Ramalhinho Dias Lourenço,et al.  Iterated Local Search , 2001, Handbook of Metaheuristics.

[31]  H. Mühlenbein,et al.  Gene Pool Recombination in Genetic Algorithms , 1996 .

[32]  Jos Uunk,et al.  Can the Maximum Entropy Principle Be Explained as a Consistency Requirement? , 1997 .

[33]  W. Deming,et al.  On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known , 1940 .

[34]  Adnan Darwiche,et al.  Inference in belief networks: A procedural guide , 1996, Int. J. Approx. Reason..

[35]  Tom Heskes,et al.  On the Uniqueness of Loopy Belief Propagation Fixed Points , 2004, Neural Computation.

[36]  Yee Whye Teh,et al.  On Improving the Efficiency of the Iterative Proportional Fitting Procedure , 2003, AISTATS.

[37]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[38]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[39]  Alexander K. Hartmann,et al.  Ground state of the Bethe lattice spin glass and running time of an exact optimization algorithm , 2003 .

[40]  Yee Whye Teh,et al.  Bethe free energy and contrastive divergence approximations for undirected graphical models , 2003 .

[41]  Jeff B. Paris,et al.  In defense of the maximum entropy inference process , 1997, Int. J. Approx. Reason..

[42]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[43]  Umberto Bertelè,et al.  Nonserial Dynamic Programming , 1972 .

[44]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[45]  Jonathan Harel,et al.  Poset belief propagation-experimental results , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[46]  R. Peierls On Ising's model of ferromagnetism , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[47]  R. Kikuchi A Theory of Cooperative Phenomena , 1951 .

[48]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[49]  Yong Gao,et al.  Space Complexity of Estimation of Distribution Algorithms , 2005, Evolutionary Computation.

[50]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[51]  D. Goldberg,et al.  Building block superiority, multimodality and synchronization problems , 2001 .

[52]  Yang Xiang,et al.  A “Microscopic” Study of Minimum Entropy Search in Learning Decomposable Markov Networks , 2004, Machine Learning.

[53]  Sylvia Richardson,et al.  Markov chain concepts related to sampling algorithms , 1995 .

[54]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[55]  John Odentrantz,et al.  Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[56]  Alex S. Fukunaga Efficient Implementations of SAT Local Search , 2004, SAT.

[57]  R. B. Robbins Some Applications of Mathematics to Breeding Problems. , 1917, Genetics.

[58]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[59]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[60]  Joseph C. Culberson,et al.  On the Futility of Blind Search: An Algorithmic View of No Free Lunch , 1998, Evolutionary Computation.

[61]  Philip M. Lewis,et al.  Approximating Probability Distributions to Reduce Storage Requirements , 1959, Information and Control.

[62]  Shumeet Baluja,et al.  Using a priori knowledge to create probabilistic models for optimization , 2002, Int. J. Approx. Reason..

[63]  Thomas Stützle,et al.  SATLIB: An Online Resource for Research on SAT , 2000 .

[64]  Alden H. Wright,et al.  An Estimation of Distribution Algorithm Based on Maximum Entropy , 2004, GECCO.

[65]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[66]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[67]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[68]  Roberto Santana,et al.  Estimation of Distribution Algorithms with Kikuchi Approximations , 2005, Evolutionary Computation.

[69]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[70]  Frank Jensen,et al.  Optimal junction Trees , 1994, UAI.

[71]  Adwait Ratnaparkhi,et al.  A Simple Introduction to Maximum Entropy Models for Natural Language Processing , 1997 .

[72]  Gabriele Kern-Isberner,et al.  Characterizing the Principle of Minimum Cross-Entropy Within a Conditional-Logical Framework , 1998, Artif. Intell..

[73]  Schloss Birlinghoven Evolution in Time and Space -the Parallel Genetic Algorithm , 1991 .

[74]  R. B. Robbins Some Applications of Mathematics to Breeding Problems III. , 1917, Genetics.

[75]  T. Mahnig,et al.  A new adaptive Boltzmann selection schedule SDS , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[76]  Gilbert Syswerda,et al.  Simulated Crossover in Genetic Algorithms , 1992, FOGA.

[77]  L. Onsager Crystal statistics. I. A two-dimensional model with an order-disorder transition , 1944 .

[78]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[79]  Lee Altenberg,et al.  The Schema Theorem and Price's Theorem , 1994, FOGA.

[80]  A. Ochoa,et al.  A factorized distribution algorithm based on polytrees , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[81]  Schloss Birlinghoven,et al.  How Genetic Algorithms Really Work I.mutation and Hillclimbing , 2022 .

[82]  Alex S. Fukunaga,et al.  Evolving Local Search Heuristics for SAT Using Genetic Programming , 2004, GECCO.

[83]  Payam Pakzad,et al.  Estimation and Marginalization Using the Kikuchi Approximation Methods , 2005, Neural Computation.

[84]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[85]  Heinz Mühlenbein,et al.  Stochastic Analysis of Cellular Automata with Application to the Voter Model , 2002, Adv. Complex Syst..

[86]  T. Heskes Stable Fixed Points of Loopy Belief Propagation Are Minima of the Bethe Free Energy , 2002 .

[87]  U. Montanari,et al.  Nonserial Dynamic Programming: On the Optimal Strategy of Variable Elimination for the Rectangular Lattice , 1972 .

[88]  José Alí Moreno,et al.  An Efficient Heuristic for the Traveling Salesman Problem Based on a Growing SOM-like Algorithm , 2005 .

[89]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[90]  I. Csiszár $I$-Divergence Geometry of Probability Distributions and Minimization Problems , 1975 .

[91]  S. Kullback,et al.  Contingency tables with given marginals. , 1968, Biometrika.

[92]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[93]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[94]  Thilo Mahnig Populationsbasierte Optimierung durch das Lernen von Interaktionen mit Bayes'schen Netzen , 2001 .

[95]  Hector J. Levesque,et al.  A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[96]  David B. Fogel,et al.  Evolutionary Computation: The Fossil Record , 1998 .

[97]  Heinz Mühlenbein,et al.  A Maximum Entropy Approach to Sampling in EDA ? The Single Connected Case , 2003, CIARP.

[98]  M. Pelikán,et al.  The Bivariate Marginal Distribution Algorithm , 1999 .

[99]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[100]  Michael I. Jordan Graphical Models , 2003 .

[101]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[102]  F. Barahona On the computational complexity of Ising spin glass models , 1982 .

[103]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[104]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[105]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[106]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .