论文信息 - Stochastic Gradient Descent on Riemannian Manifolds

Stochastic Gradient Descent on Riemannian Manifolds

Stochastic gradient descent is a simple approach to find the local minima of a cost function whose evaluations are corrupted by noise. In this paper, we develop a procedure extending stochastic gradient descent algorithms to the case where the function is defined on a Riemannian manifold. We prove that, as in the Euclidian case, the gradient descent algorithm converges to a critical point of the cost function. The algorithm has numerous potential applications, and is illustrated here by four examples. In particular a novel gossip algorithm on the set of covariance matrices is derived and tested numerically.

Silvere Bonnabel | S. Bonnabel

[1] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[2] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[3] F. Bach,et al. Low-rank optimization for semidefinite convex problems , 2008, 0807.4423.

[4] Silvere Bonnabel,et al. A Simple Intrinsic Reduced-Observer for Geodesic Flow $ $ , 2008, IEEE Transactions on Automatic Control.

[5] John N. Tsitsiklis,et al. Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[6] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[7] Emmanuel J. Candès,et al. Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements , 2010, ArXiv.

[8] S. E. Tuna,et al. Global synchronization on the circle , 2008 .

[9] Gunnar Rätsch,et al. Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[10] Henrik Ohlsson,et al. Four Encounters with System Identification , 2011, Eur. J. Control.

[11] M. Arnaudon,et al. Stochastic algorithms for computing means of probability measures , 2011, 1106.5106.

[12] R. McCann,et al. A Riemannian interpolation inequality à la Borell, Brascamp and Lieb , 2001 .

[13] Naomi Ehrich Leonard,et al. Stabilization of Planar Collective Motion With Limited Communication , 2008, IEEE Transactions on Automatic Control.

[14] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15] Inderjit S. Dhillon,et al. Learning low-rank kernel matrices , 2006, ICML.

[16] Robert E. Mahony,et al. Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms , 2001, J. Mach. Learn. Res..

[17] Silvere Bonnabel,et al. Linear Regression under Fixed-Rank Constraints: A Riemannian Approach , 2011, ICML.

[18] J. Faraut,et al. Analysis on Symmetric Cones , 1995 .

[19] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[20] A. Benveniste,et al. Analysis of stochastic approximation schemes with discontinuous and dependent forcing terms with applications to data communication algorithms , 1980 .

[21] Xavier Pennec,et al. A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[22] Alan Edelman,et al. The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[23] 佐藤保,et al. Principal Components , 2021, Encyclopedic Dictionary of Archaeology.

[24] M. V. Rossum,et al. In Neural Computation , 2022 .

[25] Luc Moreau,et al. Stability of multiagent systems with time-dependent communication links , 2005, IEEE Transactions on Automatic Control.

[26] S.T. Smith,et al. Covariance, subspace, and intrinsic Crame/spl acute/r-Rao bounds , 2005, IEEE Transactions on Signal Processing.

[27] Gilles Meyer. Geometric optimization algorithms for linear regression on fixed-rank matrices , 2011 .

[28] H. V. Trees,et al. Covariance, Subspace, and Intrinsic CramrRao Bounds , 2007 .

[29] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .

[30] Alain Sarlette,et al. Coordinated Motion Design on Lie Groups , 2008, IEEE Transactions on Automatic Control.

[31] Philippe Martin,et al. Symmetry-Preserving Observers , 2006, IEEE Transactions on Automatic Control.

[32] N. Higham. MATRIX NEARNESS PROBLEMS AND APPLICATIONS , 1989 .

[33] Robert B. Ash,et al. Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[34] Pierre Rouchon,et al. An intrinsic observer for a class of Lagrangian systems , 2003, IEEE Trans. Autom. Control..

[35] T. Andô,et al. Means of positive linear operators , 1980 .

[36] Rodolphe Sepulchre,et al. Adaptive filtering for estimation of a low-rank positive semidefinite matrix , 2010 .

[37] H. Robbins. A Stochastic Approximation Method , 1951 .

[38] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[39] Erkki Oja,et al. Principal components, minor components, and linear neural networks , 1992, Neural Networks.

[40] Silvere Bonnabel,et al. Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[41] B. Afsari. Riemannian Lp center of mass: existence, uniqueness, and convexity , 2011 .

[42] Erkki Oja,et al. Subspace methods of pattern recognition , 1983 .

[43] Emmanuel J. Candès,et al. Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[44] René Vidal,et al. Average consensus on Riemannian manifolds with bounded curvature , 2011, IEEE Conference on Decision and Control and European Control Conference.

[45] Daphna Weinshall,et al. Online Learning in the Embedded Manifold of Low-rank Matrices , 2012, J. Mach. Learn. Res..

[46] R. Ortega,et al. A globally exponentially convergent immersion and invariance speed observer for mechanical systems with non-holonomic constraints , 2010, Autom..

[47] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[48] Francis R. Bach,et al. Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..