An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration

We propose an inexact variable-metric proximal point algorithm to accelerate gradient-based optimization algorithms. The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization algorithms. QNing is also compatible with composite objectives, meaning that it has the ability to provide exactly sparse solutions when the objective involves a sparsity-inducing regularization. When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems. We present experimental results where QNing gives significant improvements over competing methods for training machine learning methods on large samples and in high dimensions.

[1]  Jorge Nocedal,et al.  A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[2]  Michael A. Saunders,et al.  Proximal Newton-type methods for convex optimization , 2012, NIPS.

[3]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[4]  J. Mairal Sparse coding for machine learning, image processing and computer vision , 2010 .

[5]  Sham M. Kakade,et al.  Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[6]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[7]  J. Renaud Numerical Optimization, Theoretical and Practical Aspects— , 2006, IEEE Transactions on Automatic Control.

[8]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[9]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[10]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[11]  Mark W. Schmidt,et al.  Projected Newton-type methods in machine learning , 2011 .

[12]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[13]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[14]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[15]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[16]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[17]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[18]  J. Nocedal,et al.  Global Convergence of a Class of Quasi-newton Methods on Convex Problems, Siam Some Global Convergence Properties of a Variable Metric Algorithm for Minimization without Exact Line Searches, Nonlinear Programming, Edited , 1996 .

[19]  Hyunjoong Kim,et al.  Functional Analysis I , 2017 .

[20]  Masao Fukushima,et al.  A Globally and Superlinearly Convergent Algorithm for Nonsmooth Convex Minimization , 1996, SIAM J. Optim..

[21]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[22]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[23]  Katya Scheinberg,et al.  Global convergence rate analysis of unconstrained optimization methods based on probabilistic models , 2015, Mathematical Programming.

[24]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .

[25]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[26]  Xiaojun Chen,et al.  Proximal quasi-Newton methods for nondifferentiable convex optimization , 1999, Math. Program..

[27]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[28]  Michael I. Jordan,et al.  A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[29]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[30]  Vishal M. Patel Sparse and Redundant Representations for Inverse Problems and Recognition , 2010 .

[31]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[32]  Simon Günter,et al.  A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.

[33]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[34]  James V. Burke,et al.  On the superlinear convergence of the variable metric proximal point algorithm using Broyden and BFGS matrix secant updating , 2000, Math. Program..

[35]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[36]  Zaïd Harchaoui,et al.  Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..

[37]  Aryan Mokhtari,et al.  Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..

[38]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[39]  Claude Lemaréchal,et al.  Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[40]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[41]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[42]  S. V. N. Vishwanathan,et al.  A quasi-Newton approach to non-smooth convex optimization , 2008, ICML '08.

[43]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[44]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[45]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[46]  Jorge Nocedal,et al.  Sample size selection in optimization methods for machine learning , 2012, Math. Program..

[47]  Katya Scheinberg,et al.  Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[48]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[49]  Mohamed-Jalal Fadili,et al.  A quasi-Newton proximal splitting method , 2012, NIPS.

[50]  Robert M. Gower,et al.  Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.

[51]  Jérôme Malick,et al.  Descentwise inexact proximal algorithms for smooth optimization , 2012, Comput. Optim. Appl..

[52]  Stephen J. Wright,et al.  Inexact Successive quadratic approximation for regularized optimization , 2018, Comput. Optim. Appl..

[53]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[54]  Jorge Nocedal,et al.  An inexact successive quadratic approximation method for L-1 regularized optimization , 2016, Math. Program..

[55]  Mark W. Schmidt,et al.  Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..

[56]  Robert Mifflin,et al.  A quasi-second-order proximal bundle algorithm , 1996, Math. Program..

[57]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[58]  Katya Scheinberg,et al.  Proximal Quasi-Newton Methods for Convex Optimization , 2016, ArXiv.

[59]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[60]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[61]  Cho-Jui Hsieh,et al.  An inexact subsampled proximal Newton-type method for large-scale machine learning , 2017, ArXiv.

[62]  Panagiotis Patrinos,et al.  Forward–backward quasi-Newton methods for nonsmooth optimization problems , 2016, Computational Optimization and Applications.

[63]  Jean Ponce,et al.  Sparse Modeling for Image and Vision Processing , 2014, Found. Trends Comput. Graph. Vis..

[64]  Saeed Ghadimi,et al.  Generalized Uniformly Optimal Methods for Nonlinear Programming , 2015, Journal of Scientific Computing.

[65]  J. J. Moré,et al.  A Characterization of Superlinear Convergence and its Application to Quasi-Newton Methods , 1973 .