论文信息 - Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning

Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning

Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, sparsity) in the solution, the solution often lies on a low-dimensional manifold along which the regularizer is smooth. This paper shows that a regularized dual averaging algorithm can identify this manifold with high probability. This observation motivates an algorithmic strategy in which, once a near-optimal manifold is identified, we switch to an algorithm that searches only in this manifold, which typically has much lower intrinsic dimension than the full space, thus converging quickly to a near-optimal point with the desired structure. Computational results are presented to illustrate these claims.

Stephen J. Wright | Sangkyun Lee | Sangkyun Lee

[1] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[2] A. Lewis,et al. Identifying active constraints via partial smoothness and prox-regularity , 2003 .

[3] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[4] Peter L. Bartlett,et al. Adaptive Online Gradient Descent , 2007, NIPS.

[5] K. Chung. On a Stochastic Approximation Method , 1954 .

[6] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[7] Mihai Anitescu,et al. Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..

[8] I. Vaisman. A first course in differential geometry , 1983 .

[9] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[10] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[11] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .