Accelerated proximal incremental algorithm schemes for non-strongly convex functions

There have been a number of recent advances in accelerated gradient and proximal schemes for optimization of convex finite sum problems. Defazio introduced a simple accelerated scheme for incremental stochastic proximal algorithms inspired by gradient based methods like SAGA. He was able to prove O(1/k) convergence for non-smooth function but only under the assumption of strong convexity of component terms. We introduce a slight modification of his scheme, called MP-SAGA for which we can prove O(1/k) convergence without strong convexity, but for smooth functions. Numerical results show that our method has better or comparable convergence to Defazio's scheme, even for non-strongly convex functions. As important special cases, we also derive an accelerated schemes for a multi–class formulation of SVM as well as clustering based on the SON regularization. Finally, we introduce a simplification of Point–SAGA, called SP–SAGA for problems such as SON with large number of variables and sparse relation between variables and objective terms.

[1]  Optimal Smoothed Variable Sample-size Accelerated Proximal Methods for Structured Nonsmooth Stochastic Convex Programs , 2018 .

[2]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[3]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[4]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[7]  Aaron Defazio,et al.  A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.

[8]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[9]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[10]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[11]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[12]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[13]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[14]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[15]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[16]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[17]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[18]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[19]  Devdatt P. Dubhashi,et al.  Clustering by Sum of Norms: Stochastic Incremental Algorithm, Convergence and Cluster Recovery , 2017, ICML.