论文信息 - Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

Iterative methods that calculate their steps from approximate subgradient directions have proved to be useful for stochastic learning problems over large and streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, the solution often lies on a low-dimensional manifold of parameter space along which the regularizer is smooth. (When an l1 regularizer is used to induce sparsity in the solution, for example, this manifold is defined by the set of nonzero components of the parameter vector.) This paper shows that a regularized dual averaging algorithm can identify this manifold, with high probability, before reaching the solution. This observation motivates an algorithmic strategy in which, once an iterate is suspected of lying on an optimal or near-optimal manifold, we switch to a "local phase" that searches in this manifold, thus converging rapidly to a near-optimal point. Computational results are presented to verify the identification property and to illustrate the effectiveness of this approach.

Stephen J. Wright | Sangkyun Lee | Sangkyun Lee

[1] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2] J. Sacks. Asymptotic Distribution of Stochastic Approximation Procedures , 1958 .

[3] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[4] C. Hsiung. A first course in differential geometry , 1981 .

[5] I. Vaisman. A first course in differential geometry , 1983 .

[6] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[7] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[8] Stephen J. Wright. Identifiable Surfaces in Constrained Optimization , 1993 .

[9] Mihai Anitescu,et al. Degenerate Nonlinear Programming with a Quadratic Growth Condition , 1999, SIAM J. Optim..

[10] Adrian S. Lewis,et al. Active Sets, Nonsmoothness, and Sensitivity , 2002, SIAM J. Optim..

[11] Léon Bottou,et al. Stochastic Learning , 2003, Advanced Lectures on Machine Learning.