Learning and Planning with Timing Information in Markov Decision Processes

We consider the problem of learning and planning in Markov decision processes with temporally extended actions represented in the options framework. We propose to use predictions about the duration of extended actions to represent the state and show that this leads to a compact predictive state representation model independent of the set of primitive actions. Then we develop a consistent and efficient spectral learning algorithm for such models. Using just the timing information to represent states allows for faster improvement in the planning performance. We illustrate our approach with experiments in both synthetic and robot navigation domains.

[1]  Jack W. Carlyle,et al.  Realizations by Stochastic Finite Automata , 1971, J. Comput. Syst. Sci..

[2]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[3]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[6]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[7]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[8]  Michael L. Littman,et al.  Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[9]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[10]  Michael H. Bowling,et al.  Subjective Localization with Action Respecting Embedding , 2005, ISRR.

[11]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[12]  Michael H. Bowling,et al.  Action respecting embedding , 2005, ICML.

[13]  Eric Wiewiora,et al.  Learning predictive representations from a history , 2005, ICML.

[14]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[15]  Satinder P. Singh,et al.  Predictive state representations with options , 2006, ICML.

[16]  三嶋 博之 The theory of affordances , 2008 .

[17]  Doina Precup,et al.  Point-Based Planning for Predictive State Representations , 2008, Canadian Conference on AI.

[18]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[19]  Wolfram Erlhagen,et al.  Learning to Time: a perspective. , 2009, Journal of the experimental analysis of behavior.

[20]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[21]  Risto Miikkulainen,et al.  Learning geometry from sensorimotor experience , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Joelle Pineau,et al.  Goal-Directed Online Learning of Predictive Models , 2011, EWRL.

[24]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[25]  Joelle Pineau,et al.  Modelling Sparse Dynamical Systems with Compressed Predictive State Representations , 2013, ICML.