Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary. This ensures eigenvalues with unit norm and thus stable dynamics and training. However this comes at the cost of reduced expressivity due to the limited variety of orthogonal transformations. We propose a novel connectivity structure based on the Schur decomposition and a splitting of the Schur form into normal and non-normal parts. This allows to parametrize matrices with unit-norm eigenspectra without orthogonality constraints on eigenbases. The resulting architecture ensures access to a larger space of spectrally constrained matrices, of which orthogonal matrices are a subset. This crucial difference retains the stability advantages and training speed of orthogonal RNNs while enhancing expressivity, especially on tasks that require computations over ongoing input sequences.

[1]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jack Dongarra,et al.  LAPACK Users' guide (third ed.) , 1999 .

[6]  Surya Ganguli,et al.  Memory traces in dynamical systems , 2008, Proceedings of the National Academy of Sciences.

[7]  W. Gerstner,et al.  Non-normal amplification in random balanced neuronal networks. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[9]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[10]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Bo Jiang,et al.  A framework of constraint preserving update schemes for optimization on Stiefel manifold , 2013, Math. Program..

[12]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[13]  Les E. Atlas,et al.  Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.

[14]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[15]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[16]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[17]  Yann LeCun,et al.  Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs , 2016, ICML.

[18]  Mark S. Goldman,et al.  Memory without Feedback in a Neural Network , 2017, Neuron.

[19]  James Bailey,et al.  Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections , 2016, ICML.

[20]  Christopher Joseph Pal,et al.  On orthogonality and learning recurrent networks with long term dependencies , 2017, ICML.

[21]  Surya Ganguli,et al.  Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.

[22]  Yann Ollivier,et al.  Can recurrent neural networks warp time? , 2018, ICLR.

[23]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[25]  Inderjit S. Dhillon,et al.  Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization , 2018, ICML.

[26]  Samuel S. Schoenholz,et al.  Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.

[27]  Qiang Ye,et al.  Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.

[28]  Qiang Ye,et al.  Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform , 2018 .

[29]  Yoshua Bengio,et al.  Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies , 2019, AAAI.

[30]  Ioannis Mitliagkas,et al.  h-detach: Modifying the LSTM Gradient Towards Better Optimization , 2018, ICLR.

[31]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.

[32]  Mario Lezcano Casado,et al.  Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.

[33]  Ed H. Chi,et al.  AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks , 2019, ICLR.

[34]  Xaq Pitkow,et al.  Improved memory in recurrent neural networks with sequential non-normal dynamics , 2020, ICLR.

[35]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.