simple_rl: Reproducible Reinforcement Learning in Python

Conducting reinforcement-learning experiments can be a complex and timely pro1 cess. A full experimental pipeline will typically consist of a simulation of an en2 vironment, an implementation of one or many learning algorithms, a variety of 3 additional components designed to facilitate the agent-environment interplay, and 4 any requisite analysis, plotting, and logging thereof. In light of this complexity, 5 this paper introduces simple rl1, a new open source library for carrying out rein6 forcement learning experiments in Python 2 and 3 with a focus on simplicity. The 7 goal of simple rl is to support seamless, reproducible methods for running rein8 forcement learning experiments. This paper gives an overview of the core design 9 philosophy of the package, how it differs from existing libraries, and showcases 10 its central features. 11 0 6 12 18 24 30 36 42 48 EpisRGH 1umEHr 0 5 10 15 20 25 30 35 Cu m ul aW iv H 5H w ar G 5HprRGucWiRQ: GriGwRrlG H 3 : 4 4-lHarQiQg 5aQGRm

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[3]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[4]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Brian Tanner,et al.  RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[6]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[9]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[10]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[11]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[12]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[13]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[14]  Xiaohui Ye,et al.  Horizon: Facebook's Open Source Applied Reinforcement Learning Platform , 2018, ArXiv.

[15]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[16]  R. Bellman A Markovian Decision Process , 1957 .

[17]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[18]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[19]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[20]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[21]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[22]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[23]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[24]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[25]  R. R. Bush,et al.  A Stochastic Model with Applications to Learning , 1953 .

[26]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[27]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[29]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[30]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[31]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[32]  Patrick M. Pilarski,et al.  Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).