论文信息 - Projected Newton-type methods in machine learning - 字舞流文

Projected Newton-type methods in machine learning

We consider projected Newton-type methods for solving large-scale optimization problems arising in machine learning and related fields. We first introduce an algorithmic framework for projected Newton-type methods by reviewing a canonical projected (quasi-)Newton method. This method, while conceptually pleasing, has a high computation cost per iteration. Thus, we discuss two variants that are more scalable, namely, two-metric projection and inexact projection methods. Finally, we show how to apply the Newton-type framework to handle non-smooth objectives. Examples are provided throughout the chapter to illustrate machine learning applications of our framework.

Mark W. Schmidt | Suvrit Sra | Mark Schmidt | D. Kim | S. Sra | Dongjae Kim | D. Kim

[1] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[2] Stephen J. Wright,et al. Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[3] Boris Polyak,et al. Constrained minimization methods , 1966 .

[4] Stephen Gould,et al. Projected Subgradient Methods for Learning Sparse Gaussians , 2008, UAI.

[5] Alexandre d'Aspremont,et al. Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[6] J. Borwein,et al. Two-Point Step Size Gradient Methods , 1988 .

[7] J. Nocedal. Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[8] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .

[9] Roger Fletcher,et al. Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming , 2005, Numerische Mathematik.

[10] Philip E. Gill,et al. Practical optimization , 1981 .

[11] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[12] S. V. N. Vishwanathan,et al. Variable Metric Stochastic Approximation Theory , 2009, AISTATS.

[13] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[14] A. Banerjee. Convex Analysis and Optimization , 2006 .

[15] Patrick L. Combettes,et al. Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[16] José Mario Martínez,et al. Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[17] Mark W. Schmidt,et al. Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[18] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[19] Robert D. Nowak,et al. An EM algorithm for wavelet-based image restoration , 2003, IEEE Trans. Image Process..

[20] I. Daubechies,et al. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[21] J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[22] Jorge Nocedal,et al. Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[23] D. Bertsekas,et al. TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[24] Michael P. Friedlander,et al. Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[25] Mário A. T. Figueiredo,et al. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[26] S. V. N. Vishwanathan,et al. A quasi-Newton approach to non-smooth convex optimization , 2008, ICML '08.

[27] Sergey Bakin,et al. Adaptive regression and model selection in data mining problems , 1999 .

[28] Emmanuel J. Candès,et al. A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[29] L. Grippo,et al. A nonmonotone line search technique for Newton's method , 1986 .

[30] D. F. Shanno,et al. Matrix conditioning and nonlinear optimization , 1978, Math. Program..

[31] Inderjit S. Dhillon,et al. A scalable trust-region algorithm with application to mixed-norm regression , 2010, ICML.

[32] D. Bertsekas. Projected Newton methods for optimization problems with simple constraints , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.