Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold

In this paper we extend the natural gradient method for neural networks to the case where the weight vectors are constrained to the Stiefel manifold. The proposed methods involve numerical integration techniques of the gradient flow without violating the manifold constraints. The extensions are based on geodesics. We rigorously formulate the previously proposed natural gradient and geodesics on the manifold exploiting the fact that the Stiefel manifold is a homogeneous space having a transitive action by the orthogonal group. Based on this fact, we further develop a simpler updating rule and one parameter family of its generalizations. The effectiveness of the proposed methods is validated by experiments in minor subspace analysis and independent component analysis.

[1]  Simone G. O. Fiori,et al.  A Theory for Learning by Weight Flow on Stiefel-Grassman Manifold , 2001, Neural Computation.

[2]  R. Mahony Optimization algorithms on homogeneous spaces: with application in linear systems theory , 1995, Journal and proceedings of the Royal Society of New South Wales.

[3]  Mark D. Plumbley Algorithms for Non-Negative Independent Component Analysis , 2002 .

[4]  Simone G. O. Fiori,et al.  A Minor Subspace Algorithm Based on Neural Stiefel Dynamics , 2002, Int. J. Neural Syst..

[5]  I. Yamada,et al.  An orthogonal matrix optimization by Dual Cayley Parametrization Technique , 2003 .

[6]  A. Iserles,et al.  Methods for the approximation of the matrix exponential in a Lie‐algebraic setting , 1999, math/9904122.

[7]  I. Holopainen Riemannian Geometry , 1927, Nature.

[8]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[9]  Mark D. Plumbley Algorithms for nonnegative independent component analysis , 2003, IEEE Trans. Neural Networks.

[10]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[11]  Jonathan H. Manton,et al.  Optimization algorithms exploiting unitary constraints , 2002, IEEE Trans. Signal Process..

[12]  Christopher K. I. Williams,et al.  Magnification factors for the GTM algorithm , 1997 .

[13]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[14]  Anuj Srivastava,et al.  Optimal linear representations of images for object recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Terrence J. Sejnowski,et al.  Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources , 1999, Neural Computation.

[16]  S. Shankar Sastry,et al.  Optimization Criteria and Geometric Algorithms for Motion and Structure Estimation , 2001, International Journal of Computer Vision.

[17]  Yasunori Nishimori,et al.  Learning algorithm for independent component analysis by geodesic flows on orthogonal group , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[18]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[19]  Shun-ichi Amari,et al.  Unified stabilization approach to principal and minor components extraction algorithms , 2001, Neural Networks.

[20]  U. Helmke,et al.  Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.

[21]  E. Hairer,et al.  Geometric Numerical Integration , 2022, Oberwolfach Reports.

[22]  Y. Suris The Problem of Integrable Discretization: Hamiltonian Approach , 2003 .

[23]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[24]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[25]  John B. Moore,et al.  Numerical Gradient Algorithms for Eigenvalue and Singular Value Calculations , 1994 .

[26]  Simone G. O. Fiori,et al.  A theory for learning based on rigid bodies dynamics , 2002, IEEE Trans. Neural Networks.

[27]  K. Fukumizu,et al.  Chapter 17 Geometry of neural networks: Natural gradient for learning , 2001 .

[28]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .