NEWLSTM: An Optimized Long Short-Term Memory Language Model for Sequence Prediction

The long short-term memory (LSTM) model trained on the universal language modeling task overcomes the bottleneck of vanishing gradients in the traditional recurrent neural network (RNN) and shows excellent performance in processing multiple tasks generated by natural language processing. Although LSTM effectively alleviates the vanishing gradient problem in the RNN, the information will be greatly lost in the long distance transmission, and there are still some limitations in its practical use. In this paper, we propose a new model called NEWLSTM, which improves the LSTM model, and alleviates the defects of too many parameters in LSTM and the vanishing gradient. The NEWLSTM model directly correlates the cell state information with current information. The traditional LSTM’s input gate and forget gate are integrated, some components are deleted, the problems of too many LSTM parameters and complicated calculations are solved, and the iteration time is effectively reduced. In this paper, a neural network model is used to identify the relationship between input information sequences to predict the language sequence. The experimental results show that the improved new model is simpler than traditional LSTM models and LSTM variants on multiple test sets. NEWLSTM has better overall stability and can better solve the sparse words problem.

[1]  Masaaki Nagata,et al.  Character n-gram Embeddings to Improve RNN Language Models , 2019, AAAI.

[2]  Jun Hu,et al.  Transformation-gated LSTM: efficient capture of short-term mutation dependencies for multivariate time series prediction tasks , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[3]  James C. Christensen,et al.  Deep long short-term memory structures model temporal dependencies improving cognitive workload estimation , 2017, Pattern Recognit. Lett..

[4]  Nino Antulov-Fantulin,et al.  Exploring Interpretable LSTM Neural Networks over Multi-Variable Data , 2019, ICML.

[5]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[6]  Thomas S. Huang,et al.  Dilated Recurrent Neural Networks , 2017, NIPS.

[7]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[8]  Junjie Yao,et al.  Recurrent Neural Network for Text Classification with Hierarchical Multiscale Dense Connections , 2019, IJCAI.

[9]  Shueng-Han Gary Chan,et al.  DA-LSTM: A Long Short-Term Memory with Depth Adaptive to Non-uniform Information Flow in Sequential Data , 2019, ArXiv.

[10]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[11]  Fathi M. Salem,et al.  Performance of Three Slim Variants of The Long Short-Term Memory (LSTM) Layer , 2019, 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS).

[12]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[13]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[14]  Daehan Kwak,et al.  Fuzzy Ontology and LSTM-Based Text Mining: A Transportation Network Monitoring System for Assisting Travel , 2019, Sensors.

[15]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[17]  Fathi M. Salem SLIM LSTMs , 2018, ArXiv.

[18]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  A. Mohan,et al.  A Deep Learning based Approach to Reduced Order Modeling for Turbulent Flow Control using LSTM Neural Networks , 2018, 1804.09269.

[21]  Kaisheng Yao,et al.  Depth-Gated Recurrent Neural Networks , 2015 .

[22]  Alex Pappachen James,et al.  Overview of Long Short-Term Memory Neural Networks , 2019, Modeling and Optimization in Science and Technologies.

[23]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[24]  Robert Jenssen,et al.  An overview and comparative analysis of Recurrent Neural Networks for Short Term Load Forecasting , 2017, ArXiv.

[25]  Joakim Nivre,et al.  Recursive Subtree Composition in LSTM-Based Dependency Parsing , 2019, NAACL.

[26]  Yong Yu,et al.  A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures , 2019, Neural Computation.

[27]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[28]  James Henderson,et al.  Deep Residual Output Layers for Neural Language Generation , 2019, ICML.

[29]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[30]  Wei Lu,et al.  Dependency-Guided LSTM-CRF for Named Entity Recognition , 2019, EMNLP.