Abstraction in Reinforcement Learning

Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. Usually, the problem to be solved contains subtasks that repeat at different regions of the state space. Without any guidance an agent has to learn the solutions of all subtask instances independently, which in turn degrades the performance of the learning process. In this work, we propose two novel approaches for building the connections between different regions of the search space. The first approach efficiently discovers abstractions in the form of conditionally terminating sequences and represents these abstractions compactly as a single tree structure; this structure is then used to determine the actions to be executed by the agent. In the second approach, a similarity function between states is defined based on the number of common action sequences; by using this similarity function, updates on the action-value function of a state are reflected to all similar states that allows experience acquired during learning be applied to a broader context. The effectiveness of both approaches is demonstrated empirically over various domains.

[1]  Bruce L. Digney,et al.  Learning hierarchical control structures for multiple tasks and changing environments , 1998 .

[2]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[3]  R. Sutton,et al.  Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .

[4]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Mance E. Harmon,et al.  Reinforcement Learning: A Tutorial. , 1997 .

[7]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[8]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[9]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Daniel Kudenko,et al.  Learning in multi-agent systems , 2001, The Knowledge Engineering Review.

[12]  Tucker R. Balch,et al.  Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning , 2001, ICML.

[13]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[14]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[15]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[16]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[17]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[18]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[19]  Balaraman Ravindran,et al.  Symmetries and Model Minimization in Markov Decision Processes , 2001 .

[20]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[21]  Faruk Polat,et al.  Option Discovery in Reinforcement Learning using Frequent Common Subsequences of Actions , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[22]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[23]  Barbara Hayes Roth Architectural foundations for real-time performance in intelligent agents , 1990 .

[24]  R. Bellman Dynamic programming. , 1957, Science.

[25]  Reda Alhajj,et al.  State Similarity Based Approach for Improving Performance in RL , 2007, IJCAI.

[26]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[27]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[28]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[29]  Reda Alhajj,et al.  Learning by Automatic Option Discovery from Conditionally Terminating Sequences , 2006, ECAI.

[30]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[31]  Reda Alhajj,et al.  Effectiveness of Considering State Similarity for Reinforcement Learning , 2006, IDEAL.

[32]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[33]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[34]  Jeffrey S. Rosenschein,et al.  Best-response multiagent learning in non-stationary environments , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..