A view of estimation of distribution algorithms through the lens of expectation-maximization

We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs.

[1]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[2]  Tom Schaul,et al.  Fitness Expectation Maximization , 2008, PPSN.

[3]  James Martens,et al.  New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..

[4]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[6]  Frank Hutter,et al.  Maximizing acquisition functions for Bayesian optimization , 2018, NeurIPS.

[7]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[8]  Siddhartha Shakya,et al.  Optimization by estimation of distribution with DEUM framework based on Markov random fields , 2007, Int. J. Autom. Comput..

[9]  David Barber,et al.  Optimization by Variational Bounding , 2013, ESANN.

[10]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[11]  A. Hero,et al.  Acceleration of the EM algorithm via proximal point iterations , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).

[12]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[13]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Galin L. Jones,et al.  Ascent‐based Monte Carlo expectation– maximization , 2005 .

[16]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[17]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[18]  Stefan Schaal,et al.  Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[19]  Abdullah Al Mamun,et al.  Multi-Objective Optimization with Estimation of Distribution Algorithm in a Noisy Environment , 2013, Evolutionary Computation.

[20]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[21]  Gersende Fort,et al.  Convergence of the Monte Carlo expectation maximization for curved exponential families , 2003 .

[22]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[23]  Farhan Abrol,et al.  Variational Tempering , 2016, AISTATS.

[24]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[26]  Alfred O. Hero,et al.  Kullback proximal algorithims for maximum-likelihood estimation , 2000, IEEE Trans. Inf. Theory.

[27]  R. Rubinstein The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .

[28]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[29]  Isao Ono,et al.  Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.

[30]  Tom Schaul,et al.  Stochastic search using the natural gradient , 2009, ICML '09.

[31]  Ruslan Salakhutdinov,et al.  Adaptive Overrelaxed Bound Optimization Methods , 2003, ICML.

[32]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[33]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[34]  Alfred O. Hero,et al.  Kullback proximal algorithims for maximum-likelihood estimation , 2000, IEEE Trans. Inf. Theory.

[35]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[36]  Matteo Matteucci,et al.  Natural gradient, fitness modelling and model selection: A unifying perspective , 2013, 2013 IEEE Congress on Evolutionary Computation.

[37]  Qiang Liu,et al.  Stein Variational Gradient Descent Without Gradient , 2018, ICML.

[38]  Mauro Birattari,et al.  Model-Based Search for Combinatorial Optimization: A Critical Survey , 2004, Ann. Oper. Res..

[39]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[40]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[41]  Petros Koumoutsakos,et al.  Learning probability distributions in continuous evolutionary algorithms – a comparative review , 2004, Natural Computing.

[42]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[43]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[44]  SchmidhuberJürgen,et al.  Natural evolution strategies , 2011, J. Mach. Learn. Res..

[45]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[46]  Jörg Lücke,et al.  Evolutionary expectation maximization , 2018, GECCO.

[47]  Neil D. Lawrence,et al.  Fast Variational Inference in the Conjugate Exponential Family , 2012, NIPS.

[48]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[49]  David Barber,et al.  Approximate Newton Methods for Policy Search in Markov Decision Processes , 2016, J. Mach. Learn. Res..

[50]  James Hensman,et al.  Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.

[51]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[52]  P. Wolynes,et al.  Spin glasses and the statistical mechanics of protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Isabelle Bloch,et al.  Estimation of Distribution Algorithms: A New Evolutionary Computation Approach for Graph Matching Problems , 2001, EMMCVPR.

[54]  Jiaqiao Hu,et al.  Annealing adaptive search, cross‐entropy, and stochastic approximation in global optimization , 2011 .

[55]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[56]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[57]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[58]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[59]  Josef Schwarz,et al.  Estimation Distribution Algorithm for mixed continuous-discrete optimization problems , 2002 .

[60]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[61]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .