BackPropagation Through Time
暂无分享,去创建一个
This report provides detailed description and necessary derivations for the BackPropagation Through Time (BPTT) algorithm. BPTT is often used to learn recurrent neural networks (RNN). Contrary to feed-forward neural networks, the RNN is characterized by the ability of encoding longer past information, thus very suitable for sequential models. The BPTT extends the ordinary BP algorithm to suit the recurrent neural architecture. 1 Basic Definitions For a two-layer feed-forward neural network, we notate the input layer as x indexed by variable i, the hidden layer as s indexed by variable j, and the output layer as y indexed by variable k. The weight matrix that map the input vector to the hidden layer is V, while the hidden layer is propagated through the weight matrix W, to the output layer. In a simple recurrent neural network, we attach every neural layer a time subscript t. The input layer consists of two components, x(t) and the privious activation of the hidden layer s(t − 1) indexed by variable h. The corresponding weight matrix is U. Table 1 lists all the notations used in this report: Neural layer Description Index variable x(t) input layer i s(t− 1) previous hidden (state) layer h s(t) hidden (state) layer j y(t) output layer k Weight matrix Description Index variables V Input layer → Hidden layer i, j U Previous hidden layer → Hidden layer h, j W Hidden layer → Output layer j, k Table 1: Notations in the recurrent neural network. Then, the recurrent neural network can be processed as the following: • Input layer → Hidden layer sj(t) = f(netj(t)) (1)
[1] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).
[2] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.