Projected Newton-type methods in machine learning

We consider projected Newton-type methods for solving large-scale optimization problems arising in machine learning and related fields. We first introduce an algorithmic framework for projected Newton-type methods by reviewing a canonical projected (quasi-)Newton method. This method, while conceptually pleasing, has a high computation cost per iteration. Thus, we discuss two variants that are more scalable, namely, two-metric projection and inexact projection methods. Finally, we show how to apply the Newton-type framework to handle non-smooth objectives. Examples are provided throughout the chapter to illustrate machine learning applications of our framework.

[1]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[2]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[3]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[4]  Stephen Gould,et al.  Projected Subgradient Methods for Learning Sparse Gaussians , 2008, UAI.

[5]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[6]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[7]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[8]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[9]  Roger Fletcher,et al.  Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming , 2005, Numerische Mathematik.

[10]  Philip E. Gill,et al.  Practical optimization , 1981 .

[11]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[12]  S. V. N. Vishwanathan,et al.  Variable Metric Stochastic Approximation Theory , 2009, AISTATS.

[13]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[14]  A. Banerjee Convex Analysis and Optimization , 2006 .

[15]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[16]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[17]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[18]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[19]  Robert D. Nowak,et al.  An EM algorithm for wavelet-based image restoration , 2003, IEEE Trans. Image Process..

[20]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[21]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[22]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[23]  D. Bertsekas,et al.  TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[24]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[25]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[26]  S. V. N. Vishwanathan,et al.  A quasi-Newton approach to non-smooth convex optimization , 2008, ICML '08.

[27]  Sergey Bakin,et al.  Adaptive regression and model selection in data mining problems , 1999 .

[28]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[29]  L. Grippo,et al.  A nonmonotone line search technique for Newton's method , 1986 .

[30]  D. F. Shanno,et al.  Matrix conditioning and nonlinear optimization , 1978, Math. Program..

[31]  Inderjit S. Dhillon,et al.  A scalable trust-region algorithm with application to mixed-norm regression , 2010, ICML.

[32]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.