On the Memory Mechanism of Tensor-Power Recurrent Models

Tensor-power (TP) recurrent model is a family of non-linear dynamical systems, of which the recurrence relation consists of a p-fold (a.k.a., degree-p) tensor product. Despite such the model frequently appears in the advanced recurrent neural networks (RNNs), to this date there is limited study on its memory property, a critical characteristic in sequence tasks. In this work, we conduct a thorough investigation of the memory mechanism of TP recurrent models. Theoretically, we prove that a large degree p is an essential condition to achieve the long memory effect, yet it would lead to unstable dynamical behaviors. Empirically, we tackle this issue by extending the degree p from discrete to a differentiable domain, such that it is efficiently learnable from a variety of datasets. Taken together, the new model is expected to benefit from the long memory effect in a stable manner. We experimentally show that the proposed model achieves competitive performance compared to various advanced RNNs in both the single-cell and seq2seq architectures. This work is accepted to AISTATS2021

[1]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[2]  Ruslan Salakhutdinov,et al.  Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex , 2019, AISTATS.

[3]  Lei Deng,et al.  Kronecker CP Decomposition With Fast Multiplication for Compressing RNNs , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[5]  Razvan Pascanu,et al.  Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks , 2013, ECML/PKDD.

[6]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[7]  Chao Li,et al.  H-OWAN: Multi-distorted Image Restoration with Tensor 1x1 Convolution , 2020, ArXiv.

[8]  Valentin Khrulkov,et al.  Generalized Tensor Models for Recurrent Neural Networks , 2019, ICLR.

[9]  Rudrasis Chakraborty,et al.  Scaling Recurrent Models via Orthogonal Approximations in Tensor Trains , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Jürgen Schmidhuber,et al.  Learning to Reason with Third-Order Tensor Products , 2018, NeurIPS.

[11]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[12]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[13]  Tengyu Ma,et al.  Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..

[14]  Yufei Tang,et al.  Physics-Informed Tensor-Train ConvLSTM for Volumetric Velocity Forecasting of the Loop Current , 2020, Frontiers in Artificial Intelligence.

[15]  Ryota Tomioka,et al.  Spectral norm of random tensors , 2014, 1407.1870.

[16]  Zenglin Xu,et al.  Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition , 2018, AAAI.

[17]  Gene H. Golub,et al.  Symmetric Tensors and Symmetric Tensor Rank , 2008, SIAM J. Matrix Anal. Appl..

[18]  Zaïd Harchaoui,et al.  A Statistical Investigation of Long Memory in Language and Music , 2019, ICML.

[19]  Wilfredo Palma,et al.  Long‐Memory Processes , 2006 .

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Moritz Hardt,et al.  Stable Recurrent Models , 2018, ICLR.

[22]  Chris Eliasmith,et al.  Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , 2019, NeurIPS.

[23]  Takayuki Okatani,et al.  Feature Quantization for Defending Against Distortion of Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[25]  Rose Yu,et al.  Learning Chaotic Dynamics using Tensor Recurrent Neural Networks , 2017 .

[26]  Ivan Oseledets,et al.  Expressive power of recurrent neural networks , 2017, ICLR.

[27]  Qibin Zhao,et al.  High-order Learning Model via Fractional Tensor Network Decomposition , 2020 .

[28]  Yisong Yue,et al.  Long-term Forecasting using Higher Order Tensor RNNs , 2017 .

[29]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[30]  Satoshi Nakamura,et al.  Compressing recurrent neural network with tensor train , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[31]  Jan Kautz,et al.  Convolutional Tensor-Train LSTM for Spatio-temporal Learning , 2020, NeurIPS.

[32]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[33]  Zhen Qin,et al.  Do RNN and LSTM have Long Memory? , 2020, ICML.

[34]  Svetha Venkatesh,et al.  Learning to Remember More with Less Memorization , 2019, ICLR.

[35]  Zenglin Xu,et al.  Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Doina Precup,et al.  Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning , 2018, AISTATS.

[37]  Mathias Lechner,et al.  Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[38]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[39]  Louis-Philippe Morency,et al.  Efficient Low-rank Multimodal Fusion With Modality-Specific Factors , 2018, ACL.

[40]  Quoc V. Le,et al.  Learning Longer-term Dependencies in RNNs with Auxiliary Losses , 2018, ICML.

[41]  Pierre Comon,et al.  Symmetric tensor decomposition , 2009, 2009 17th European Signal Processing Conference.

[42]  Ning Zheng,et al.  TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data , 2020, ECCV.

[43]  S. Amari,et al.  Nonnegative Matrix and Tensor Factorization [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[44]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[45]  Hui Jiang,et al.  Higher Order Recurrent Neural Networks , 2016, ArXiv.

[46]  Jianhai Zhang,et al.  Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling , 2019, NeurIPS.