State Abstractions for Lifelong Reinforcement Learning

In lifelong reinforcement learning, agents must effectively transfer knowledge across tasks while simultaneously addressing exploration, credit assignment, and generalization. State abstraction can help overcome these hurdles by compressing the representation used by an agent, thereby reducing the computational and statistical burdens of learning. To this end, we here develop theory to compute and use state abstractions in lifelong reinforcement learning. We introduce two new classes of abstractions: (1) transitive state abstractions, whose optimal form can be computed efficiently, and (2) PAC state abstractions, which are guaranteed to hold with respect to a distribution of tasks. We show that the joint family of transitive PAC abstractions can be acquired efficiently, preserve near optimal-behavior, and experimentally reduce sample complexity in simple domains, thereby yielding a family of desirable abstractions for use in lifelong reinforcement learning. Along with these positive results, we show that there are pathological cases where state abstractions can negatively impact performance.

[1]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[2]  Stefanie Tellex,et al.  Goal-Based Action Priors , 2015, ICAPS.

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  Lihong Li,et al.  The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning , 2015, ArXiv.

[5]  Yishay Mansour,et al.  Approximate Equivalence of Markov Decision Processes , 2003, COLT.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  R. Bellman A Markovian Decision Process , 1957 .

[8]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[9]  Roy Mendelssohn,et al.  An Iterative Aggregation Procedure for Markov Decision Processes , 1982, Oper. Res..

[10]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[11]  Nan Jiang,et al.  Improving UCT planning via approximate homomorphisms , 2014, AAMAS.

[12]  Carlos Guestrin,et al.  Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[13]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[14]  Parag Singla,et al.  ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.

[15]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[16]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[17]  Lihong Li,et al.  Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[18]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[19]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[20]  Masayuki Yamamura,et al.  Multitask reinforcement learning on the distribution of MDPs , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[21]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[24]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[25]  Thomas J. Walsh Transferring State Abstractions Between MDPs , 2006 .

[26]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[27]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[28]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[29]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[30]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[31]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[32]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[33]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[34]  Marcus Hutter,et al.  Extreme state aggregation beyond Markov decision processes , 2016, Theor. Comput. Sci..

[35]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..

[36]  van der J Jan Wal,et al.  Aggregation — Disaggregation Algorithms for Discrete Stochastic Systems , 1988 .

[37]  M. Littman,et al.  Toward Good Abstractions for Lifelong Learning , 2017 .

[38]  Nathan R. Sturtevant,et al.  Speeding Up Learning in Real-time Search via Automatic State Abstraction , 2005, AAAI.

[39]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.