Temporal Abstraction in Temporal-difference Networks

We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network. Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment. These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of actions were taken. In conventional TD networks, the inter-related predictions are at successive time steps and contingent on a single action; here we generalize them to accommodate extended time intervals and contingency on whole ways of behaving. Our generalization is based on the options framework for temporal abstraction. The primary contribution of this paper is to introduce a new algorithm for intra-option learning in TD networks with function approximation and eligibility traces. We present empirical examples of our algorithm's effectiveness and of the greater representational expressiveness of temporally-abstract TD networks.

[1]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[2]  Gary L. Drescher,et al.  Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[3]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[4]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[5]  Mark B. Ring Continual learning in reinforcement environments , 1995, GMD-Bericht.

[6]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[7]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[9]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[10]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[13]  H. Jaeger,et al.  A short introduction to observable operator models of stochastic processes , 1998 .

[14]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[15]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  R. Sutton,et al.  Off-policy Learning with Recognizers , 2000 .

[18]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[19]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[20]  Sanjoy Dasgupta,et al.  Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.

[21]  Satinder P. Singh,et al.  A Nonlinear Predictive State Representation , 2003, NIPS.

[22]  Matthew W. Mitchell,et al.  Using Markov-k Memory for Problems with Hidden-State , 2003, MLMTA.

[23]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[24]  Charles Lee Isbell,et al.  Schema Learning: Experience-Based Construction of Predictive Action Models , 2004, NIPS.

[25]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[26]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[27]  Michael L. Littman,et al.  Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[28]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[29]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[30]  Michael R. James,et al.  Planning in Models that Combine Memory with Predictive Representations of State , 2005, AAAI.

[31]  Richard S. Sutton,et al.  Temporal-Difference Networks with History , 2005, IJCAI.

[32]  Richard S. Sutton,et al.  TD(λ) networks: temporal-difference networks with eligibility traces , 2005, ICML.

[33]  Eric Wiewiora,et al.  Learning predictive representations from a history , 2005, ICML.

[34]  Michael R. James,et al.  Combining Memory and Landmarks with Predictive State Representations , 2005, IJCAI.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Richard S. Sutton,et al.  Using Predictive Representations to Improve Generalization in Reinforcement Learning , 2005, IJCAI.

[37]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[38]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[39]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[40]  Joelle Pineau,et al.  Representing Systems with Hidden State , 2006, AAAI.

[41]  Satinder P. Singh,et al.  Predictive linear-Gaussian models of controlled stochastic dynamical systems , 2006, ICML.

[42]  Satinder P. Singh,et al.  Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems , 2006, AAAI.

[43]  Satinder P. Singh,et al.  Predictive state representations with options , 2006, ICML.

[44]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[45]  Satinder P. Singh,et al.  Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.