论文信息 - Accelerated proximal incremental algorithm schemes for non-strongly convex functions

Accelerated proximal incremental algorithm schemes for non-strongly convex functions

There have been a number of recent advances in accelerated gradient and proximal schemes for optimization of convex finite sum problems. Defazio introduced a simple accelerated scheme for incremental stochastic proximal algorithms inspired by gradient based methods like SAGA. He was able to prove O(1/k) convergence for non-smooth function but only under the assumption of strong convexity of component terms. We introduce a slight modification of his scheme, called MP-SAGA for which we can prove O(1/k) convergence without strong convexity, but for smooth functions. Numerical results show that our method has better or comparable convergence to Defazio's scheme, even for non-strongly convex functions. As important special cases, we also derive an accelerated schemes for a multi–class formulation of SVM as well as clustering based on the SON regularization. Finally, we introduce a simplification of Point–SAGA, called SP–SAGA for problems such as SON with large number of variables and sparse relation between variables and objective terms.

[1] Optimal Smoothed Variable Sample-size Accelerated Proximal Methods for Structured Nonsmooth Stochastic Convex Programs , 2018 .

[2] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[3] Zeyuan Allen Zhu,et al. Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[4] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[5] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .

[7] Aaron Defazio,et al. A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.

[8] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[9] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[10] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[11] Marc Teboulle,et al. Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[12] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[13] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[14] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[15] Francis R. Bach,et al. Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[16] Amir Beck,et al. First-Order Methods in Optimization , 2017 .

[17] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[18] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[19] Devdatt P. Dubhashi,et al. Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery , 2017, ICML.