Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial

The aim of this contribution is to present a tutorial on learning algorithms for a single neural layer whose connection matrix belongs to the orthogonal group. The algorithms exploit geodesics appropriately connected as piece-wise approximate integrals of the exact differential learning equation. The considered learning equations essentially arise from the Riemannian-gradient-based optimization theory with deterministic and diffusion-type gradient. The paper aims specifically at reviewing the relevant mathematics (and at presenting it in as much transparent way as possible in order to make it accessible to readers that do not possess a background in differential geometry), at bringing together modern optimization methods on manifolds and at comparing the different algorithms on a common machine learning problem. As a numerical case-study, we consider an application to non-negative independent component analysis, although it should be recognized that Riemannian gradient methods give rise to general-purpose algorithms, by no means limited to ICA-related applications.

[1]  Simone G. O. Fiori,et al.  Fast fixed-point neural blind-deconvolution algorithm , 2004, IEEE Transactions on Neural Networks.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Mark D. Plumbley Lie Group Methods for Optimization with Orthogonality Constraints , 2004, ICA.

[4]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[5]  Simone G. O. Fiori,et al.  A Theory for Learning by Weight Flow on Stiefel-Grassman Manifold , 2001, Neural Computation.

[6]  A. Fordy APPLICATIONS OF LIE GROUPS TO DIFFERENTIAL EQUATIONS (Graduate Texts in Mathematics) , 1987 .

[7]  Simone G. O. Fiori,et al.  A theory for learning based on rigid bodies dynamics , 2002, IEEE Trans. Neural Networks.

[8]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[9]  Simone G. O. Fiori,et al.  Formulation and integration of learning differential equations on the stiefel manifold , 2005, IEEE Transactions on Neural Networks.

[10]  P. Olver Applications of Lie Groups to Differential Equations , 1986 .

[11]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo in Practice: A Roundtable Discussion , 1998 .

[12]  Mark D. Plumbley Conditions for nonnegative independent component analysis , 2002, IEEE Signal Processing Letters.

[13]  U. Grenander,et al.  Jump–diffusion Markov processes on orthogonal groups for object pose estimation , 2002 .

[14]  T. Akuzawa NEW FAST FACTORIZATION METHOD FOR MULTIVARIATE OPTIMIZATION AND ITS REALIZATION AS ICA ALGORITHM , 2001 .

[15]  Desmond J. Higham,et al.  An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations , 2001, SIAM Rev..

[16]  Simone G. O. Fiori,et al.  Nonlinear Complex-Valued Extensions of Hebbian Learning: An Essay , 2005, Neural Computation.

[17]  Andrzej Cichocki,et al.  Geometrical Structures of FIR Manifold and Multichannel Blind Deconvolution , 2002 .

[18]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[19]  Elena Celledoni,et al.  Norges Teknisk-naturvitenskapelige Universitet Neural Learning by Geometric Integration of Reduced 'rigid-body' Equations Neural Learning by Geometric Integration of Reduced 'rigid-body' Equations , 2022 .

[20]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[21]  Shun-ichi Amari,et al.  Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information , 1997, Neural Computation.

[22]  Mark D. Plumbley Algorithms for nonnegative independent component analysis , 2003, IEEE Trans. Neural Networks.

[23]  Anuj Srivastava,et al.  Optimal linear representations of images for object recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[25]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .