Gradient Temporal Difference Networks

Temporal-difference (TD) networks (Sutton and Tanner, 2004) are a predictive representation of state in which each node is an answer to a question about future observations or questions. Unfortunately, existing algorithms for learning TD networks are known to diverge, even in very simple problems. In this paper we present the first sound learning rule for TD networks. Our approach is to develop a true gradient descent algorithm that takes account of all three roles performed by each node in the network: as state, as an answer, and as a target for other questions. Our algorithm combines gradient temporal-difference learning (Maei et al., 2009) with real-time recurrent learning (Williams and Zipser, 1994). We provide a generalisation of the Bellman equation that corresponds to the semantics of the TD network, and prove that our algorithm converges to a fixed point of this equation.

[1]  Takaki Makino,et al.  On-line discovery of temporal-difference networks , 2008, ICML '08.

[2]  Siegfried M. Rump,et al.  INTLAB - INTerval LABoratory , 1998, SCAN.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[5]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[6]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[7]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[8]  P. Werbos Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .

[9]  Richard S. Sutton,et al.  TD(λ) networks: temporal-difference networks with eligibility traces , 2005, ICML.

[10]  Takaki Makino,et al.  Proto-predictive representation of states with simple recurrent temporal-difference networks , 2009, ICML '09.

[11]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[12]  Richard S. Sutton,et al.  Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[13]  Richard S. Sutton,et al.  Temporal-Difference Networks with History , 2005, IJCAI.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[16]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[17]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.