Temporal-Difference Networks with History
暂无分享,去创建一个
Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by history-based methods alone. In addition, we apply TD networks to a problem that, while still simple, is significantly larger than has previously been considered. We show that history-extended TD networks can learn much of the common-sense knowledge of an egocentric gridworld domain with a single bit of perception.
[1] Sebastian Thrun,et al. Learning low dimensional predictive representations , 2004, ICML.
[2] Richard S. Sutton,et al. Temporal-Difference Networks , 2004, NIPS.
[3] H. Jaeger. Discrete-time, discrete-valued observable operator models: a tutorial , 2003 .
[4] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[5] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.