Fast orthogonal recurrent neural networks employing a novel parametrisation for orthogonal matrices

Abstract Training Recurrent Neural Networks (RNNs) is challenging due to the vanishing/exploding gradient problem. Recent progress suggests to solve this problem by constraining the recurrent transition matrix to be unitary/orthogonal during training, but all of which are either limited-capacity, or involve time-consuming operators, e.g., evaluation for the derivation of lengthy matrix chain multiplication, the matrix exponential, or the singular value decomposition. This paper addresses this problem based on the exponentials of sparse antisymmetric matrices with one or more nonzero columns and an equal number of nonzero rows from a geometric view. An analytical expression is presented to simplify the computation of the sparse antisymmetric matrix exponential, which is actually a novel formula for parameterizing orthogonal matrices. The algorithms of this paper are fast, tunable, and full-capacity, where the target variable is updated by optimizing a matrix multiplier, instead of using the explicit gradient descent. Experiments demonstrate the superior performance of our proposed algorithms.

[1]  Cleve B. Moler,et al.  Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later , 1978, SIAM Rev..

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Simone G. O. Fiori,et al.  Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial , 2005, J. Mach. Learn. Res..

[4]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[5]  J. Gallier,et al.  COMPUTING EXPONENTIALS OF SKEW-SYMMETRIC MATRICES AND LOGARITHMS OF ORTHOGONAL MATRICES , 2002 .

[6]  Joan Lasenby,et al.  Applications of Conformal Geometric Algebra in Computer Vision and Graphics , 2004, IWMM/GIAE.

[7]  Gunnar Rätsch,et al.  Learning Unitary Operators with Help From u(n) , 2016, AAAI.

[8]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[9]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[10]  Qingming Huang,et al.  Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[12]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[13]  Visa Koivunen,et al.  Steepest Descent Algorithms for Optimization Under Unitary Matrix Constraint , 2008, IEEE Transactions on Signal Processing.

[14]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[15]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[16]  Manfred K. Warmuth,et al.  Learning rotations with little regret , 2010, Machine Learning.

[17]  William A. Sethares,et al.  An Efficient and Stable Algorithm for Learning Rotations , 2010, 2010 20th International Conference on Pattern Recognition.

[18]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Nicoletta Del Buono,et al.  A Survey on Methods for Computing Matrix Exponentials in Numerical Schemes for ODEs , 2003, International Conference on Computational Science.

[21]  D. Hestenes,et al.  Lie-groups as Spin groups. , 1993 .

[22]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[23]  Mark D. Plumbley Algorithms for nonnegative independent component analysis , 2003, IEEE Trans. Neural Networks.

[24]  Simone G. O. Fiori,et al.  A theory for learning based on rigid bodies dynamics , 2002, IEEE Trans. Neural Networks.

[25]  Jonathan H. Manton,et al.  Optimization algorithms exploiting unitary constraints , 2002, IEEE Trans. Signal Process..

[26]  Christopher Joseph Pal,et al.  On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[27]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[28]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  James Bailey,et al.  Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections , 2016, ICML.

[30]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[31]  Inderjit S. Dhillon,et al.  Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization , 2018, ICML.

[32]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[33]  Raman Arora,et al.  On Learning Rotations , 2009, NIPS.