Consensus-based Optimization on the Sphere II: Convergence to Global Minimizers and Machine Learning

We present the implementation of a new stochastic Kuramoto-Vicsek-type model for global optimization of nonconvex functions on the sphere. This model belongs to the class of Consensus-Based Optimization. In fact, particles move on the sphere driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function according to Laplace's principle, and it represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached the stochastic component vanishes. The main results of this paper are about the proof of convergence of the numerical scheme to global minimizers provided conditions of well-preparation of the initial datum. The proof combines previous results of mean-field limit with a novel asymptotic analysis, and classical convergence results of numerical methods for SDE. We present several numerical experiments, which show that the algorithm proposed in the present paper scales well with the dimension and is extremely versatile. To quantify the performances of the new approach, we show that the algorithm is able to perform essentially as good as ad hoc state of the art methods in challenging problems in signal processing and machine learning, namely the phase retrieval problem and the robust subspace detection.

[1]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[2]  Lorenzo Pareschi,et al.  Mean field models for large data-clustering problems , 2019, Networks Heterog. Media.

[3]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[4]  Michel Gendreau,et al.  Handbook of Metaheuristics , 2010 .

[5]  Gilad Lerman,et al.  A Well-Tempered Landscape for Non-convex Robust Subspace Recovery , 2017, J. Mach. Learn. Res..

[6]  R. Fetecau,et al.  Propagation of chaos for the Keller–Segel equation over bounded domains , 2018, Journal of Differential Equations.

[7]  Massimo Fornasier,et al.  Mean Field Control Hierarchy , 2016, Applied Mathematics & Optimization.

[8]  E. Platen An introduction to numerical methods for stochastic differential equations , 1999, Acta Numerica.

[9]  Holger Rauhut,et al.  Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers , 2019, ArXiv.

[10]  A. Sznitman Topics in propagation of chaos , 1991 .

[11]  Lorenzo Pareschi,et al.  Consensus-Based Optimization on the Sphere I: Well-Posedness and Mean-Field Limit , 2020, ArXiv.

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  D. Stroock,et al.  Simulated annealing via Sobolev inequalities , 1988 .

[14]  Jian-Guo Liu,et al.  Error estimate of a random particle blob method for the Keller-Segel equation , 2017, Math. Comput..

[15]  R. Balan,et al.  On signal reconstruction without phase , 2006 .

[16]  Jos'e A. Carrillo,et al.  An analytical framework for a consensus-based global optimization method , 2016, 1602.00220.

[17]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[18]  Yonina C. Eldar,et al.  Phase Retrieval: Stability and Recovery Guarantees , 2012, ArXiv.

[19]  H. Quiney Coherent diffractive imaging using short wavelength light sources , 2010 .

[20]  Emile H. L. Aarts,et al.  Simulated annealing and Boltzmann machines - a stochastic approach to combinatorial optimization and neural computing , 1990, Wiley-Interscience series in discrete mathematics and optimization.

[21]  R. Gerchberg A practical algorithm for the determination of phase from image and diffraction plane pictures , 1972 .

[22]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[23]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[24]  Dustin G. Mixon,et al.  Saving phase: Injectivity and stability for phase retrieval , 2013, 1302.4618.

[25]  Adel Javanmard,et al.  Analysis of a Two-Layer Neural Network via Displacement Convexity , 2019, The Annals of Statistics.

[26]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[27]  N. Hurt Phase Retrieval and Zero Crossings: Mathematical Methods in Image Reconstruction , 1989 .

[28]  Joel A. Tropp,et al.  Robust Computation of Linear Models by Convex Relaxation , 2012, Foundations of Computational Mathematics.

[29]  G. Marsaglia Choosing a Point from the Surface of a Sphere , 1972 .

[30]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2013, SIAM J. Imaging Sci..

[31]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[32]  Zbigniew Michalewicz,et al.  Parameter Control in Evolutionary Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[33]  E. Hairer,et al.  Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .

[34]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[35]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[36]  P. Miller Applied asymptotic analysis , 2006 .

[37]  David B. Fogel,et al.  Evolutionary Computation: Toward a New Philosophy of Machine Intelligence (IEEE Press Series on Computational Intelligence) , 2006 .

[38]  R. Pinnau,et al.  A consensus-based model for global optimization and its mean-field limit , 2016, 1604.05648.

[39]  Aleksandar Mijatovi'c,et al.  A note on the exact simulation of spherical Brownian motion , 2018, Statistics & Probability Letters.

[40]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[41]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[42]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[43]  Lei Li,et al.  Random Batch Methods (RBM) for interacting particle systems , 2018, J. Comput. Phys..

[44]  Lorenzo Pareschi,et al.  Binary Interaction Algorithms for the Simulation of Flocking and Swarming Dynamics , 2012, Multiscale Model. Simul..

[45]  Shi Jin,et al.  A consensus-based global optimization method for high dimensional machine learning problems , 2019 .

[46]  Tom Goldstein,et al.  PhasePack: A phase retrieval library , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[47]  Dimitris Achlioptas,et al.  Bad Global Minima Exist and SGD Can Reach Them , 2019, NeurIPS.

[48]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[49]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[50]  F. Peruani,et al.  A geometric approach to self-propelled motion in isotropic & anisotropic environments , 2015, 1504.01694.

[51]  Wilma A. Bainbridge,et al.  The intrinsic memorability of face photographs. , 2013, Journal of experimental psychology. General.

[52]  Gilad Lerman,et al.  Fast, Robust and Non-convex Subspace Recovery , 2014, 1406.6145.

[53]  Andrea Montanari,et al.  Fundamental Limits of Weak Recovery with Applications to Phase Retrieval , 2017, COLT.

[54]  B Y Gu,et al.  Gerchberg-Saxton and Yang-Gu algorithms for phase retrieval in a nonunitary transform system: a comparison. , 1994, Applied optics.

[55]  Michael I. Jordan,et al.  First-order methods almost always avoid strict saddle points , 2019, Mathematical Programming.

[56]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[57]  Robert W. Harrison,et al.  Phase problem in crystallography , 1993 .

[58]  Hugo Jair Escalante,et al.  Particle Swarm Model Selection , 2009, J. Mach. Learn. Res..

[59]  Marco Dorigo,et al.  Ant colony optimization theory: A survey , 2005, Theor. Comput. Sci..

[60]  Veit Elser,et al.  Benchmark problems for phase retrieval , 2017, SIAM J. Imaging Sci..

[61]  Mervin E. Muller,et al.  A note on a method for generating points uniformly on n-dimensional spheres , 1959, CACM.

[62]  A. Guillin,et al.  On the rate of convergence in Wasserstein distance of the empirical measure , 2013, 1312.2128.

[63]  A. Walther The Question of Phase Retrieval in Optics , 1963 .