MA-LSTM: A Multi-Attention Based LSTM for Complex Pattern Extraction

With the improvement of data volume, computing power and algorithms, deep learning has achieved rapid development and showing excellent performance. Recently, many deep learning models are proposed to solve the problems in different areas. A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior, which makes it applicable to tasks such as handwriting recognition or speech recognition. However, the RNN relies heavily on the automatic learning ability to update parameters that concentrate on the data flow but seldom considers the feature extraction capability of the gate mechanism. In this paper, we propose a novel architecture to build the forget gate which is generated by multiple bases. Instead of using the traditional single-layer fully-connected network, we use a Multiple Attention (MA) based network to generate the forget gate which refines the optimization space of gate function and improve the granularity of the recurrent neural network to approximate the map in the ground truth. Due to the benefit of MA structure on the gate mechanism, the proposed MA-LSTM model achieves better feature extraction capability than other known models.

[1]  Xianfeng Tang,et al.  Revisiting Spatial-Temporal Similarity: A Deep Learning Framework for Traffic Prediction , 2018, AAAI.

[2]  Kensuke Yokoi,et al.  APAC: Augmented PAttern Classification with Neural Networks , 2015, ArXiv.

[3]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[4]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[5]  Dilin Wang,et al.  Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[6]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[14]  Jieping Ye,et al.  Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction , 2018, AAAI.

[15]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.