MASAGA: A Linearly-Convergent Stochastic First-Order Method for Optimization on Manifolds

We consider the stochastic optimization of finite sums over a Riemannian manifold where the functions are smooth and convex. We present MASAGA, an extension of the stochastic average gradient variant SAGA on Riemannian manifolds. SAGA is a variance-reduction technique that typically outperforms methods that rely on expensive full-gradient calculations, such as the stochastic variance-reduced gradient method. We show that MASAGA achieves a linear convergence rate with uniform sampling, and we further show that MASAGA achieves a faster convergence rate with non-uniform sampling. Our experiments show that MASAGA is faster than the recent Riemannian stochastic gradient descent algorithm for the classic problem of finding the leading eigenvector corresponding to the maximum eigenvalue. Code related to this paper is available at: https://github.com/IssamLaradji/MASAGA.

[1]  Peter Richtárik,et al.  Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[2]  Ami Wiesel,et al.  Geodesic Convexity and Covariance Estimation , 2012, IEEE Transactions on Signal Processing.

[3]  H. Robbins A Stochastic Approximation Method , 1951 .

[4]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[6]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[7]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[9]  W. Ziller Riemannian Manifolds with Positive Sectional Curvature , 2012, 1210.4102.

[10]  Tong Zhang,et al.  Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.

[11]  Suvrit Sra,et al.  Geometric Optimization in Machine Learning , 2016 .

[12]  Charles Guyon,et al.  Robust Principal Component Analysis for Background Subtraction: Systematic Evaluation and Comparative Analysis , 2012 .

[13]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[14]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[15]  Teng Zhang,et al.  Robust Principal Component Analysis by Manifold Optimization , 2017 .

[16]  Mark W. Schmidt,et al.  Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017, 1712.08859.

[17]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[18]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[19]  Rong Jin,et al.  MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization , 2013, ArXiv.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  John Wright,et al.  Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[22]  Suvrit Sra,et al.  Matrix Manifold Optimization for Gaussian Mixtures , 2015, NIPS.

[23]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[24]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[25]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[26]  Augustin-Louis Cauchy,et al.  ANALYSE MATHÉMATIQUE. – Méthodc générale pour la résolution des systèmes d'équations simultanées , 2009 .

[27]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[28]  Xiaodi Li,et al.  Globally Exponential Stability of Impulsive Neural Networks with Given Convergence Rate , 2013, Adv. Artif. Neural Syst..

[29]  Mark W. Schmidt,et al.  Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields , 2015, AISTATS.

[30]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[31]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[32]  Ben Jeuris,et al.  A survey and comparison of contemporary algorithms for computing the matrix geometric mean , 2012 .

[33]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient , 2016, SIAM J. Optim..

[34]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[35]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[36]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[37]  Mark W. Schmidt,et al.  StopWasting My Gradients: Practical SVRG , 2015, NIPS.

[38]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient on Grassmann manifold , 2016, ArXiv.

[39]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[40]  Julien Mairal,et al.  Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.

[41]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.