论文信息 - Strategic Object Oriented Reinforcement Learning

Strategic Object Oriented Reinforcement Learning

Humans learn to play video games significantly faster than state-of-the-art reinforcement learning (RL) algorithms. Inspired by this, we introduce strategic object oriented reinforcement learning (SOORL) to learn simple dynamics model through automatic model selection and perform efficient planning with strategic exploration. We compare different exploration strategies in a model-based setting in which exact planning is impossible. Additionally, we test our approach on perhaps the hardest Atari game Pitfall! and achieve significantly improved exploration and performance over prior methods.

Emma Brunskill | Ramtin Keramati | Jay Whang | Patrick Cho

[1] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[2] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[3] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[4] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[5] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[6] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[7] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[8] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.

[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[10] Christopher D. Rosin,et al. Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[11] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[12] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[13] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[14] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15] Alexei A. Efros,et al. Investigating Human Priors for Playing Video Games , 2018, ICML.

[16] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[17] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.

[18] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).

[19] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[20] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[21] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[22] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.

[23] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[24] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[25] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.

[26] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.