Stochastic Gradient Descent on Riemannian Manifolds

Stochastic gradient descent is a simple approach to find the local minima of a cost function whose evaluations are corrupted by noise. In this paper, we develop a procedure extending stochastic gradient descent algorithms to the case where the function is defined on a Riemannian manifold. We prove that, as in the Euclidian case, the gradient descent algorithm converges to a critical point of the cost function. The algorithm has numerous potential applications, and is illustrated here by four examples. In particular a novel gossip algorithm on the set of covariance matrices is derived and tested numerically.

[1]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[2]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3]  F. Bach,et al.  Low-rank optimization for semidefinite convex problems , 2008, 0807.4423.

[4]  Silvere Bonnabel,et al.  A Simple Intrinsic Reduced-Observer for Geodesic Flow $ $ , 2008, IEEE Transactions on Automatic Control.

[5]  John N. Tsitsiklis,et al.  Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[6]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[7]  Emmanuel J. Candès,et al.  Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[8]  S. E. Tuna,et al.  Global synchronization on the circle , 2008 .

[9]  Gunnar Rätsch,et al.  Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[10]  Henrik Ohlsson,et al.  Four Encounters with System Identification , 2011, Eur. J. Control.

[11]  M. Arnaudon,et al.  Stochastic algorithms for computing means of probability measures , 2011, 1106.5106.

[12]  R. McCann,et al.  A Riemannian interpolation inequality à la Borell, Brascamp and Lieb , 2001 .

[13]  Naomi Ehrich Leonard,et al.  Stabilization of Planar Collective Motion With Limited Communication , 2008, IEEE Transactions on Automatic Control.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[16]  Robert E. Mahony,et al.  Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms , 2001, J. Mach. Learn. Res..

[17]  Silvere Bonnabel,et al.  Linear Regression under Fixed-Rank Constraints: A Riemannian Approach , 2011, ICML.

[18]  J. Faraut,et al.  Analysis on Symmetric Cones , 1995 .

[19]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[20]  A. Benveniste,et al.  Analysis of stochastic approximation schemes with discontinuous and dependent forcing terms with applications to data communication algorithms , 1980 .

[21]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[22]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[23]  佐藤 保,et al.  Principal Components , 2021, Encyclopedic Dictionary of Archaeology.

[24]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[25]  Luc Moreau,et al.  Stability of multiagent systems with time-dependent communication links , 2005, IEEE Transactions on Automatic Control.

[26]  S.T. Smith,et al.  Covariance, subspace, and intrinsic Crame/spl acute/r-Rao bounds , 2005, IEEE Transactions on Signal Processing.

[27]  Gilles Meyer Geometric optimization algorithms for linear regression on fixed-rank matrices , 2011 .

[28]  H. V. Trees,et al.  Covariance, Subspace, and Intrinsic CramrRao Bounds , 2007 .

[29]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[30]  Alain Sarlette,et al.  Coordinated Motion Design on Lie Groups , 2008, IEEE Transactions on Automatic Control.

[31]  Philippe Martin,et al.  Symmetry-Preserving Observers , 2006, IEEE Transactions on Automatic Control.

[32]  N. Higham MATRIX NEARNESS PROBLEMS AND APPLICATIONS , 1989 .

[33]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[34]  Pierre Rouchon,et al.  An intrinsic observer for a class of Lagrangian systems , 2003, IEEE Trans. Autom. Control..

[35]  T. Andô,et al.  Means of positive linear operators , 1980 .

[36]  Rodolphe Sepulchre,et al.  Adaptive filtering for estimation of a low-rank positive semidefinite matrix , 2010 .

[37]  H. Robbins A Stochastic Approximation Method , 1951 .

[38]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[39]  Erkki Oja,et al.  Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[40]  Silvere Bonnabel,et al.  Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[41]  B. Afsari Riemannian Lp center of mass: existence, uniqueness, and convexity , 2011 .

[42]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[43]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[44]  René Vidal,et al.  Average consensus on Riemannian manifolds with bounded curvature , 2011, IEEE Conference on Decision and Control and European Control Conference.

[45]  Daphna Weinshall,et al.  Online Learning in the Embedded Manifold of Low-rank Matrices , 2012, J. Mach. Learn. Res..

[46]  R. Ortega,et al.  A globally exponentially convergent immersion and invariance speed observer for mechanical systems with non-holonomic constraints , 2010, Autom..

[47]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[48]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..