Towards the geometry of estimation of distribution algorithms based on the exponential family

In this paper we present a geometrical framework for the analysis of Estimation of Distribution Algorithms (EDAs) based on the exponential family. From a theoretical point of view, an EDA can be modeled as a sequence of densities in a statistical model that converges towards distributions with reduced support. Under this framework, at each iteration the empirical mean of the fitness function decreases in probability, until convergence of the population. This is the context of stochastic relaxation, i.e., the idea of looking for the minima of a function by minimizing its expected value over a set of probability densities. Our main interest is in the study of the gradient of the expected value of the function to be minimized, and in particular on how its landscape changes according to the fitness function and the statistical model used in the relaxation. After introducing some properties of the exponential family, such as the description of its topological closure and of its tangent space, we provide a characterization of the stationary points of the relaxed problem, together with a study of the minimizing sequences with reduced support. The analysis developed in the paper aims to provide a theoretical understanding of the behavior of EDAs, and in particular their ability to converge to the global minimum of the fitness function. The theoretical results of this paper, beside providing a formal framework for the analysis of EDAs, lead to the definition of a new class algorithms for binary functions optimization based on Stochastic Natural Gradient Descent (SNGD), where the estimation of the parameters of the distribution is replaced by the direct update of the model parameters by estimating the natural gradient of the expected value of the fitness function.

[1]  C. Ribeiro,et al.  Essays and Surveys in Metaheuristics , 2002, Operations Research/Computer Science Interfaces Series.

[2]  G. Nemhauser,et al.  Integer Programming , 2020 .

[3]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[4]  Giovanni Pistone Algebraic varieties vs differentiable manifolds in statistical models , 2009 .

[5]  S. Amari,et al.  Gradient systems in view of information geometry , 1995 .

[6]  S. Baluja,et al.  Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space , 1997 .

[7]  Mauro Birattari,et al.  Model-Based Search for Combinatorial Optimization: A Critical Survey , 2004, Ann. Oper. Res..

[8]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[9]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[10]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[11]  Matteo Matteucci,et al.  Evoptool: An extensible toolkit for evolutionary optimization algorithms comparison , 2010, IEEE Congress on Evolutionary Computation.

[12]  C. McDiarmid SIMULATED ANNEALING AND BOLTZMANN MACHINES A Stochastic Approach to Combinatorial Optimization and Neural Computing , 1991 .

[13]  Henry P. Wynn,et al.  Algebraic and geometric methods in statistics , 2009 .

[14]  Roberto Santana A Markov Network Based Factorized Distribution Algorithm for Optimization , 2003, ECML.

[15]  David E. Goldberg,et al.  The compact genetic algorithm , 1999, IEEE Trans. Evol. Comput..

[16]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[17]  N. N. Chent︠s︡ov Statistical decision rules and optimal inference , 1982 .

[18]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[19]  G. B. Smith,et al.  Preface to S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images” , 1987 .

[20]  Siddhartha Shakya,et al.  Optimization by estimation of distribution with DEUM framework based on Markov random fields , 2007, Int. J. Autom. Comput..

[21]  Roberto Santana,et al.  Estimation of Distribution Algorithms with Kikuchi Approximations , 2005, Evolutionary Computation.

[22]  Pedro Larrañaga,et al.  Mathematical Modeling of Discrete Estimation of Distribution Algorithms , 2002, Estimation of Distribution Algorithms.

[23]  Michael D. Vose,et al.  The simple genetic algorithm - foundations and theory , 1999, Complex adaptive systems.

[24]  Shun-ichi Amari,et al.  Information geometry on hierarchy of probability distributions , 2001, IEEE Trans. Inf. Theory.

[25]  Shumeet Baluja,et al.  Using Optimal Dependency-Trees for Combinational Optimization , 1997, ICML.

[26]  Paul A. Viola,et al.  MIMIC: Finding Optima by Estimating Probability Densities , 1996, NIPS.

[27]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[28]  A. Rinaldo,et al.  On the geometry of discrete exponential families with application to exponential random graph models , 2008, 0901.0026.

[29]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[30]  Thomas Kahle,et al.  Mathematik in den Naturwissenschaften Leipzig Support Sets in Exponential Families and Oriented Matroid Theory , 2009 .

[31]  Jean B. Lasserre,et al.  Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[32]  H. Mühlenbein,et al.  From Recombination of Genes to the Estimation of Distributions I. Binary Parameters , 1996, PPSN.

[33]  A. Agresti An introduction to categorical data analysis , 1997 .

[34]  Heinz Mühlenbein,et al.  Evolutionary Algorithms and the Boltzmann Distribution , 2002, FOGA.

[35]  C. Hwang Laplace's Method Revisited: Weak Convergence of Probability Measures , 1980 .

[36]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[37]  I. Csiszár,et al.  Closures of exponential families , 2005, math/0503653.

[38]  L. Brown Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .

[39]  G. Harik Linkage Learning via Probabilistic Modeling in the ECGA , 1999 .

[40]  D. Goldberg,et al.  BOA: the Bayesian optimization algorithm , 1999 .

[41]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[42]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[43]  M. Pelikán,et al.  The Bivariate Marginal Distribution Algorithm , 1999 .

[44]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[45]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[46]  Giovanni Pistone,et al.  A note on the border of an exponential family , 2010, 1012.0637.

[47]  Heinz Mühlenbein,et al.  Schemata, Distributions and Graphical Models in Evolutionary Optimization , 1999, J. Heuristics.

[48]  T. Mahnig,et al.  Mathematical Analysis of Evolutionary Algorithms , 2002 .