Sparse Learning for Stochastic Composite Optimization

In this paper, we focus on Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution. Although many SCO algorithms have been developed for sparse learning with an optimal convergence rate $O(1/T)$, they often fail to deliver sparse solutions at the end either because of the limited sparsity regularization during stochastic optimization or due to the limitation in online-to-batch conversion. To improve the sparsity of solutions obtained by SCO, we propose a simple but effective stochastic optimization scheme that adds a novel sparse online-to-batch conversion to the traditional SCO algorithms. The theoretical analysis shows that our scheme can find a solution with better sparse patterns without affecting the convergence rate. Experimental results on both synthetic and real-world data sets show that the proposed methods are more effective in recovering the sparse solution and have comparable convergence rate as the state-of-the-art SCO algorithms for sparse learning.

[1]  Xi Chen,et al.  Optimal Regularized Dual Averaging Methods for Stochastic Optimization , 2012, NIPS.

[2]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[3]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[4]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[5]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[6]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[7]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[8]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[9]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[10]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[12]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[13]  Yoram Singer,et al.  Data-Driven Online to Batch Conversions , 2005, NIPS.

[14]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[15]  Qihang Lin A Sparsity Preserving Stochastic Gradient Method for Composite Optimization , 2011 .

[16]  Elad Hazan 24th Annual Conference on Learning Theory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization , 2022 .