Speeding up Tabular Reinforcement Learning Using State-Action Similarities

One of the most prominent approaches for speeding up reinforcement learning is injecting human prior knowledge into the learning agent. This paper proposes a novel method to speed up temporal difference learning by using state-action similarities. These hand-coded similarities are tested in three well-studied domains of varying complexity, demonstrating our approach's benefits.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[3]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .

[6]  Sonia Chernova,et al.  Learning from Demonstration for Shaping through Inverse Reinforcement Learning , 2016, AAMAS.

[7]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[8]  Matthew E. Taylor,et al.  Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence , 2014, AAAI.

[9]  Carlos H. C. Ribeiro Attentional Mechanisms as a Strategy for Generalization in the Q-Learning Algorithm , 1995 .

[10]  Amos Azaria,et al.  Adaptive Advice in Automobile Climate Control Systems , 2015, AAAI Workshop: AI for Transportation.

[11]  Stephen P. Brooks,et al.  Markov Decision Processes. , 1989 .

[12]  Reinaldo A. C. Bianchi,et al.  Heuristically-Accelerated Reinforcement Learning: A Comparative Analysis of Performance , 2013, TAROS.

[13]  Reinaldo A. C. Bianchi,et al.  Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Sonia Chernova,et al.  Integrating reinforcement learning with human demonstrations of varying ability , 2011, AAMAS.

[16]  Peter Stone,et al.  Model-based function approximation in reinforcement learning , 2007, AAMAS '07.

[17]  Claudia V. Goldman,et al.  Online Prediction of Exponential Decay Time Series with Human-Agent Application , 2016, ECAI.

[18]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[19]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[20]  Sarit Kraus,et al.  Providing Arguments in Discussions on the Basis of the Prediction of Human Argumentative Behavior , 2016, ACM Trans. Interact. Intell. Syst..

[21]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[22]  Sonia Chernova,et al.  Reinforcement Learning from Demonstration through Shaping , 2015, IJCAI.

[23]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[24]  Julian Togelius,et al.  The Mario AI Benchmark and Competitions , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[25]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Ariel Rosenfeld,et al.  Automated Agents for Advice Provision , 2015, IJCAI.

[28]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[29]  Reda Alhajj,et al.  Positive Impact of State Similarity on Reinforcement Learning Performance , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[31]  Ana Paiva,et al.  An Associative State-Space Metric for Learning in Factored MDPs , 2013, EPIA.

[32]  Matthew E. Taylor,et al.  Metric learning for reinforcement learning agents , 2011, AAMAS.

[33]  Xiaodong Li,et al.  Dynamic Choice of State Abstraction in Q-Learning , 2016, ECAI.

[34]  David Sarne,et al.  Intelligent Advice Provisioning for Repeated Interaction , 2016, AAAI.

[35]  Balaraman Ravindran,et al.  On the hardness of finding symmetries in Markov decision processes , 2008, ICML '08.

[36]  Michael L. Littman,et al.  Efficient Reinforcement Learning with Relocatable Action Models , 2007, AAAI.

[37]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[38]  Sam Devlin,et al.  Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.

[39]  Noa Agmon,et al.  Intelligent agent supporting human-multi-robot team collaboration , 2015, Artif. Intell..

[40]  David L. Roberts,et al.  A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans , 2016, AAMAS.

[41]  Amos Azaria,et al.  Advice Provision for Energy Saving in Automobile Climate-Control System , 2015, AI Mag..

[42]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[43]  Tucker R. Balch,et al.  Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.

[44]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.