论文信息 - Learning State Abstractions for Transfer in Continuous Control - 字舞流文

Learning State Abstractions for Transfer in Continuous Control

Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our main contribution is a learning algorithm that abstracts a continuous state-space into a discrete one. We transfer this learned representation to unseen problems to enable effective learning. We provide theory showing that learned abstractions maintain a bounded value loss, and we report experiments showing that the abstractions empower tabular Q-Learning to learn efficiently in unseen tasks.

Kavosh Asadi | Michael L. Littman | David Abel | M. Littman | Kavosh Asadi | David Abel

[1] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[2] Oliver Brock,et al. Learning state representations with robotic priors , 2015, Auton. Robots.

[3] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[4] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[5] Maximilian Karl,et al. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[6] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[7] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[8] S. Whiteson,et al. Adaptive Tile Coding for Value Function Approximation , 2007 .

[9] Katja Hofmann,et al. Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[10] George Konidaris,et al. On the necessity of abstraction , 2019, Current Opinion in Behavioral Sciences.

[11] Andrew G. Barto,et al. Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[12] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[15] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[16] Andrew McCallum,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[17] John McCarthy,et al. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955 , 2006, AI Mag..

[18] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[19] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..

[20] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[21] Henry Y. K. Lau,et al. Adaptive state space partitioning for reinforcement learning , 2004, Eng. Appl. Artif. Intell..

[22] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[23] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[24] Lawson L. S. Wong,et al. State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.

[25] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[26] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[27] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[28] Andrew G. Barto,et al. Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[29] Andrea Lockerd Thomaz,et al. Automatic task decomposition and state abstraction from demonstration , 2012, AAMAS.

[30] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[31] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[32] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[33] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[34] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[35] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[36] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[37] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[38] Ben J. A. Kröse,et al. Adaptive State Space Quantisation For Reinforcement Learning Of collision-free navigation , 1992, IROS.

[39] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[40] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[41] Jason Pazis,et al. PAC Optimal Exploration in Continuous Space Markov Decision Processes , 2013, AAAI.

[42] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[43] Peter Stone,et al. State Abstraction Synthesis for Discrete Models of Continuous Domains , 2018, AAAI Spring Symposia.

[44] Bruno Castro da Silva,et al. Learning Parameterized Skills , 2012, ICML.

[45] Thomas J. Walsh. Transferring State Abstractions Between MDPs , 2006 .

[46] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[47] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[48] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[50] Andrea Lockerd Thomaz,et al. Automatic State Abstraction from Demonstration , 2011, IJCAI.

[51] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[52] Marlos C. Machado,et al. State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[53] Marcus Hutter,et al. On Q-learning Convergence for Non-Markov Decision Processes , 2018, IJCAI.

[54] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[55] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[56] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[57] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.

[58] Sriraam Natarajan,et al. Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[59] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[60] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[61] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[62] Lihong Li,et al. PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[63] Andreas Maurer,et al. A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.