Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a low-dimensional manifold of parameter space along which the regularizer is smooth. (When an l1 regularizer is used to induce sparsity in the solution, for example, this manifold is defined by the set of nonzero components of the parameter vector.) This paper shows that a regularized dual averaging algorithm can identify this manifold, with high probability, before reaching the solution. This observation motivates an algorithmic strategy in which, once an iterate is suspected of lying on an optimal or near-optimal manifold, we switch to a "local phase" that searches in this manifold, thus converging rapidly to a near-optimal point. Computational results are presented to verify the identification property and to illustrate the effectiveness of this approach.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  J. Sacks Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[3]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[4]  C. Hsiung A first course in differential geometry , 1981 .

[5]  I. Vaisman A first course in differential geometry , 1983 .

[6]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[7]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[8]  Stephen J. Wright Identifiable Surfaces in Constrained Optimization , 1993 .

[9]  Mihai Anitescu,et al.  Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..

[10]  Adrian S. Lewis,et al.  Active Sets, Nonsmoothness, and Sensitivity , 2002, SIAM J. Optim..

[11]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[12]  A. Lewis,et al.  Identifying active constraints via partial smoothness and prox-regularity , 2003 .

[13]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[14]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[15]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[16]  Stephen J. Wright,et al.  Active Set Identification in Nonlinear Programming , 2006, SIAM J. Optim..

[17]  Adam Tauman Kalai,et al.  Logarithmic Regret Algorithms for Online Convex Optimization , 2006, COLT.

[18]  Stephen J. Wright,et al.  LASSO-Patternsearch Algorithm with Application to Ophthalmology Data , 2006 .

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[21]  Peter L. Bartlett,et al.  Adaptive Online Gradient Descent , 2007, NIPS.

[22]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[23]  Grace Wahba,et al.  LASSO-Patternsearch algorithm with application to ophthalmology and genomic data. , 2006, Statistics and its interface.

[24]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[25]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[26]  LangfordJohn,et al.  Sparse Online Learning via Truncated Gradient , 2009 .

[27]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[28]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[29]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[30]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[31]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[32]  Qihang Lin A Sparsity Preserving Stochastic Gradient Method for Composite Optimization , 2011 .

[33]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[34]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[35]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..