论文信息 - Greedy Layer-Wise Training of Long Short Term Memory Networks

Greedy Layer-Wise Training of Long Short Term Memory Networks

Recent developments in Recurrent Neural Networks (RNNs) such as Long Short Term Memory (LSTM) have shown promising potential for modeling sequential data. Nevertheless, training LSTM is not trivial when there are multiple layers in the deep architectures. This difficulty originates from the initialization method of LSTM, where gradient-based optimization often appears to converge to poor local solutions. In this paper, we explore an unsupervised pretraining mechanism for LSTM initialization, following the philosophy that the unsupervised pretraining plays the role of a regularizer to guide the subsequent supervised training. We propose a novel encoder-decoder-based learning framework to initialize a multi-layer LSTM in a greedy layer-wise manner in which each added LSTM layer is trained to retain the main information in the previous representation. A multi-layer LSTM trained with our method outperforms the one trained with random initialization, with clear advantages on several tasks. Moreover, the multi-layer LSTMs converge 4 times faster with our greedy layer-wise training method.

[1] Peter Glöckner,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[2] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[3] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.

[4] Christopher Joseph Pal,et al. Semi-supervised Learning with Encoder-Decoder Recurrent Neural Networks: Experiments with Motion Capture Sequences , 2015, ArXiv.

[5] Yann LeCun,et al. Discriminative Recurrent Sparse Auto-Encoders , 2013, ICLR.

[6] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[7] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[9] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[10] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[12] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .

[13] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[14] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.