Parsimonious memory unit for recurrent neural networks with application to natural language processing

Abstract Recurrent Neural Networks (RNN) receive an important interest from Artificial Intelligence researches (AI) this last decade due to their high capability to learn complex internal structures to expose relevant information. However, RNNs fail to reveal long-term dependencies and new RNN with gates have been proposed to address this drawback such as Long Short-Term Memory (LSTM). This RNN-based model requires 4 gates to learn both short and long-term dependencies for a given sequence of basic elements. Recently, a new family of RNN called “Gated Recurrent Unit” has been introduced. The GRU contains few gates (reset and update gates) but is based on gates grouping without taking into account the latent relations between short and long-term dependencies. The GRU term dependencies management through hidden units is therefore similar for all hidden neurons. Moreover, the learning of gated RNNs requires a large amount of data and, despite the advent of GPU cards that allow the model to be learned quicker, the processing time is quite costly. This paper proposes a new RNN called “Parsimonious Memory Unit” (PMU) based on the strong assumption that short and long-term dependencies are related and that the role of each hidden neuron has to be different to better handle term dependencies. Experiments conduced on both a small (short-term) spoken dialogues data set from the DECODA project, a large (long-term) textual document corpus from the 20-Newsgroups and a language modeling task, show that the proposed PMU-RNN reaches similar, even better performances (efficiency) with less processing time (improve portability) with a gain of 50%. Moreover, the experiments on the gates’ activity show that the proposed PMU manages better term dependencies than the GRU-RNN model.

[1]  Jürgen Schmidhuber,et al.  Applying LSTM to Time Series Predictable through Time-Window Approaches , 2000, ICANN.

[2]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[3]  Kenji Doya,et al.  Recurrent networks: supervised learning , 1998 .

[4]  Yuefeng Li,et al.  Effective 20 Newsgroups Dataset Cleaning , 2015, 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT).

[5]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[6]  Qin Jin,et al.  Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[7]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  Leszek Gajecki Architectures of neural networks applied for LVCSR language modeling , 2014, Neurocomputing.

[10]  Ya Li,et al.  Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition , 2015, AVEC@ACM Multimedia.

[11]  Frank K. Soong,et al.  TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.

[12]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[13]  Ping He,et al.  Improving constrained clustering via swarm intelligence , 2013, Neurocomputing.

[14]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[15]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Chin-Hui Lee,et al.  Exploiting deep neural networks for detection-based speech recognition , 2013, Neurocomputing.

[18]  Bhuvana Ramabhadran,et al.  Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks , 2014, INTERSPEECH.

[19]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Frédéric Béchet,et al.  DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[21]  Jianxin Wu,et al.  Minimal gated unit for recurrent neural networks , 2016, International Journal of Automation and Computing.

[22]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[23]  Li Lin,et al.  Remaining useful life estimation of engineered systems using vanilla LSTM neural networks , 2018, Neurocomputing.

[24]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Mohamed Morchid,et al.  Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[27]  Nitish Srivastava,et al.  Modeling Documents with Deep Boltzmann Machines , 2013, UAI.

[28]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Piotr Jedrzejowicz,et al.  A cross-entropy-based population-learning algorithm for discrete-continuous scheduling with continuous resource discretisation , 2010, Neurocomputing.

[30]  Mohamed Morchid,et al.  Impact of Word Error Rate on theme identification task of highly imperfect human-human conversations , 2016, Comput. Speech Lang..

[31]  Jian Yang,et al.  Recurrent neural network for facial landmark detection , 2017, Neurocomputing.

[32]  Mohamed Morchid,et al.  Theme identification in human-human conversations with features from specific speaker type hidden spaces , 2014, INTERSPEECH.