Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning

Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, sparsity) in the solution, the solution often lies on a low-dimensional manifold along which the regularizer is smooth. This paper shows that a regularized dual averaging algorithm can identify this manifold with high probability. This observation motivates an algorithmic strategy in which, once a near-optimal manifold is identified, we switch to an algorithm that searches only in this manifold, which typically has much lower intrinsic dimension than the full space, thus converging quickly to a near-optimal point with the desired structure. Computational results are presented to illustrate these claims.

[1]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[2]  A. Lewis,et al.  Identifying active constraints via partial smoothness and prox-regularity , 2003 .

[3]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[4]  Peter L. Bartlett,et al.  Adaptive Online Gradient Descent , 2007, NIPS.

[5]  K. Chung On a Stochastic Approximation Method , 1954 .

[6]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[7]  Mihai Anitescu,et al.  Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..

[8]  I. Vaisman A first course in differential geometry , 1983 .

[9]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[10]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[11]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[12]  Stephen J. Wright,et al.  Active Set Identification in Nonlinear Programming , 2006, SIAM J. Optim..

[13]  James V. Burke,et al.  Exposing Constraints , 1994, SIAM J. Optim..

[14]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[15]  Stephen J. Wright,et al.  LASSO-Patternsearch Algorithm with Application to Ophthalmology Data , 2006 .

[16]  Qihang Lin A Sparsity Preserving Stochastic Gradient Method for Composite Optimization , 2011 .

[17]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[18]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[19]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[20]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[21]  Stephen J. Wright,et al.  Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning , 2012, J. Mach. Learn. Res..

[22]  Stephen J. Wright Identifiable Surfaces in Constrained Optimization , 1993 .

[23]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[24]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[25]  Grace Wahba,et al.  LASSO-Patternsearch algorithm with application to ophthalmology and genomic data. , 2006, Statistics and its interface.

[26]  Adrian S. Lewis,et al.  Active Sets, Nonsmoothness, and Sensitivity , 2002, SIAM J. Optim..

[27]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[28]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[29]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[30]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[32]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[33]  H. Robbins A Stochastic Approximation Method , 1951 .

[34]  Adam Tauman Kalai,et al.  Logarithmic Regret Algorithms for Online Convex Optimization , 2006, COLT.

[35]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .