Scalable Derivative-Free Optimization for Nonlinear Least-Squares Problems

Derivative-free - or zeroth-order - optimization (DFO) has gained recent attention for its ability to solve problems in a variety of application areas, including machine learning, particularly involving objectives which are stochastic and/or expensive to compute. In this work, we develop a novel model-based DFO method for solving nonlinear least-squares problems. We improve on state-of-the-art DFO by performing dimensionality reduction in the observational space using sketching methods, avoiding the construction of a full local model. Our approach has a per-iteration computational cost which is linear in problem dimension in a big data regime, and numerical evidence demonstrates that, compared to existing software, it has dramatically improved runtime performance on overdetermined least-squares problems.

[1]  Jorge Nocedal,et al.  An investigation of Newton-Sketch and subsampled Newton methods , 2017, Optim. Methods Softw..

[2]  Michael W. Mahoney,et al.  Sub-sampled Newton methods , 2018, Math. Program..

[3]  Atil Iscen,et al.  Provably Robust Blackbox Optimization for Reinforcement Learning , 2019, CoRL.

[4]  Eduard A. Gorbunov,et al.  Stochastic Three Points Method for Unconstrained Smooth Minimization , 2019, SIAM J. Optim..

[5]  Philippe L. Toint,et al.  Global and local information in structured derivative free optimization with BFO , 2020 .

[6]  Katya Scheinberg,et al.  Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm , 2017, ArXiv.

[7]  CartisCoralia,et al.  Improving the Flexibility and Robustness of Model-based Derivative-free Optimization Solvers , 2019 .

[8]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[9]  Robert M. Gower,et al.  Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[10]  Sébastien Le Digabel,et al.  HyperNOMAD , 2019, ACM Trans. Math. Softw..

[11]  Di He,et al.  A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems , 2019, ArXiv.

[12]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[13]  Andrea Cristofari,et al.  A Derivative-Free Method for Structured Optimization Problems , 2020, SIAM J. Optim..

[14]  Lindon Roberts Derivative-free algorithms for nonlinear optimisation problems , 2019 .

[15]  Katya Scheinberg,et al.  Global convergence rate analysis of unconstrained optimization methods based on probabilistic models , 2015, Mathematical Programming.

[16]  Stefan M. Wild Chapter 40: POUNDERS in TAO: Solving Derivative-Free Nonlinear Least-Squares Problems with POUNDERS , 2017 .

[17]  Katya Scheinberg,et al.  On the local convergence of a derivative-free algorithm for least-squares minimization , 2010, Computational Optimization and Applications.

[18]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[19]  Dmitry Kovalev,et al.  RSN: Randomized Subspace Newton , 2019, NeurIPS.

[20]  Nicholas I. M. Gould,et al.  CUTEst: a Constrained and Unconstrained Testing Environment with safe threads for mathematical optimization , 2013, Computational Optimization and Applications.

[21]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[22]  Coralia Cartis,et al.  A derivative-free Gauss–Newton method , 2017, Mathematical Programming Computation.

[23]  Michael Kokkolaras C. Audet and W. Hare: Derivative-free and blackbox optimization. Springer series in operations research and financial engineering , 2019 .

[24]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[25]  K. Scheinberg,et al.  A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization , 2019, Foundations of Computational Mathematics.

[26]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[27]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[28]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[29]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[30]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[31]  Jared Tanner,et al.  A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA , 2020, ArXiv.

[32]  David P. Woodruff,et al.  An Empirical Evaluation of Sketching for Numerical Linear Algebra , 2018, KDD.

[33]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[34]  Charles Audet,et al.  Derivative-Free and Blackbox Optimization , 2017 .

[35]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[36]  F. Bach,et al.  Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.