Simplification of RNN and Its Performance Evaluation in Machine Translation

In this paper, we study on simplification of RNN and propose new structures which enable faster learning to improve the performance while reducing the number of learning parameters. We construct 4 types of RNNs with new gated structures and call these new RNNs “SGR (Simple Gated RNN)”. SGR have two or one gate and weight or no weight for input. Comparison studies are performed to verify the effectiveness of our proposal. As a result of machine translation in relatively small corpus, compared with LSTM and GRU, our proposed SGR can realize higher scores than LSTM and GRU. Furthermore, SGR can realize faster learning approximately 1.7 times than GRU. However, with the increase of learning layers and weights for input, the learning scores of SGR seems not increase as much as we expected, which should be studied in our future work. It is necessary to analyze in more detail a performance in larger dataset and a performance difference due to multi-layering, weight for input and the number of gates.

[1]  Fathi M. Salem,et al.  Simplified minimal gated unit variations for recurrent neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[2]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Y. Tanaka,et al.  Compilation of a multilingual parallel corpus , 2001 .

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[8]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[9]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[10]  Yu Zhang,et al.  Simple Recurrent Units for Highly Parallelizable Recurrence , 2017, EMNLP.

[11]  Jianxin Wu,et al.  Minimal gated unit for recurrent neural networks , 2016, International Journal of Automation and Computing.

[12]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[13]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[18]  G. Miller Learning to Forget , 2004, Science.