Averaging Stochastic Gradient Descent on Riemannian Manifolds

We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients. We develop a geometric framework to transform a sequence of slowly converging iterates generated from stochastic gradient descent (SGD) on $\mathcal{M}$ to an averaged iterate sequence with a robust and fast $O(1/n)$ convergence rate. We then present an application of our framework to geodesically-strongly-convex (and possibly Euclidean non-convex) problems. Finally, we demonstrate how these ideas apply to the case of streaming $k$-PCA, where we show how to accelerate the slow rate of the randomized power method (without requiring knowledge of the eigengap) into a robust algorithm achieving the optimal rate of convergence.

[1]  V. Fabian On Asymptotic Normality in Stochastic Approximation , 1968 .

[2]  R. Has’minskiĭ,et al.  Stochastic Approximation and Recursive Estimation , 1976 .

[3]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[4]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[5]  P. Bougerol,et al.  Products of Random Matrices with Applications to Schrödinger Operators , 1985 .

[6]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[7]  D. Ruppert,et al.  Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .

[8]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[9]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[10]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[11]  Bin Yang,et al.  Projection approximation subspace tracking , 1995, IEEE Trans. Signal Process..

[12]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[13]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[14]  Maher Moakher,et al.  To appear in: SIAM J. MATRIX ANAL. APPL. MEANS AND AVERAGING IN THE GROUP OF ROTATIONS∗ , 2002 .

[15]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[16]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[17]  S.T. Smith,et al.  Covariance, subspace, and intrinsic Crame/spl acute/r-Rao bounds , 2005, IEEE Transactions on Signal Processing.

[18]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  Pierre-Antoine Absil,et al.  Trust-Region Methods on Riemannian Manifolds , 2007, Found. Comput. Math..

[21]  Yurii Nesterov,et al.  Confidence level solutions for stochastic programming , 2000, Autom..

[22]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[23]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[24]  T. Banchoff,et al.  Differential Geometry of Curves and Surfaces , 2010 .

[25]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[26]  Pierre-Antoine Absil,et al.  RTRMC: A Riemannian trust-region method for low-rank matrix completion , 2011, NIPS.

[27]  Sabine Van Huffel,et al.  Best Low Multilinear Rank Approximation of Higher-Order Tensors, Based on the Riemannian Trust-Region Scheme , 2011, SIAM J. Matrix Anal. Appl..

[28]  P. Bickel,et al.  Regression on manifolds: Estimation of the exterior derivative , 2011, 1103.1457.

[29]  Benedikt Wirth,et al.  Optimization Methods on Riemannian Manifolds and Their Application to Shape Space , 2012, SIAM J. Optim..

[30]  Jérôme Malick,et al.  Projection-like Retractions on Matrix Manifolds , 2012, SIAM J. Optim..

[31]  S. Waldmann Geometric Wave Equations , 2012, 1208.4706.

[32]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[33]  Silvere Bonnabel,et al.  Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[34]  Dario Bini,et al.  Computing the Karcher mean of symmetric positive definite matrices , 2013 .

[35]  Nicolas Boumal,et al.  On Intrinsic Cramér-Rao Bounds for Riemannian Submanifolds and Quotient Manifolds , 2013, IEEE Transactions on Signal Processing.

[36]  Wen Huang,et al.  A Broyden Class of Quasi-Newton Methods for Riemannian Optimization , 2015, SIAM J. Optim..

[37]  Suvrit Sra,et al.  Matrix Manifold Optimization for Gaussian Mixtures , 2015, NIPS.

[38]  Ohad Shamir,et al.  Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity , 2015, ICML.

[39]  Prateek Jain,et al.  Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[40]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[41]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[42]  M. Reiß,et al.  Nonasymptotic upper bounds for the reconstruction error of PCA , 2016, The Annals of Statistics.

[43]  Ohad Shamir,et al.  Convergence of Stochastic Gradient Descent for PCA , 2015, ICML.

[44]  Yuanzhi Li,et al.  First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[45]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[46]  Francis R. Bach,et al.  Stochastic Composite Least-Squares Regression with Convergence Rate $O(1/n)$ , 2017, COLT.

[47]  Hiroyuki Kasai,et al.  Riemannian stochastic variance reduced gradient , 2016, SIAM J. Optim..