Unified Algorithm to Improve Reinforcement Learning in Dynamic Environments - An Instance-based Approach

This paper presents an approach for speeding up the convergence of adaptive intelligent agents using reinforcement learning algorithms. Speeding up the learning of an intelligent agent is a complex task since the choice of inadequate updating techniques may cause delays in the learning process or even induce an unexpected acceleration that causes the agent to converge to a non-satisfactory policy. We have developed a technique for estimating policies which combines instance-based learning and reinforcement learning algorithms in Markovian environments. Experimental results in dynamic environments of different dimensions have shown that the proposed technique is able to speed up the convergence of the agents while achieving optimal action policies, avoiding problems of classical reinforcement learning approaches.

[1]  Michael P. Wellman,et al.  Strategy exploration in empirical games , 2010, AAMAS.

[2]  Guy Shani,et al.  High-level reinforcement learning in strategy games , 2010, AAMAS.

[3]  Fabrício Enembreck,et al.  LEARNING DRIFTING NEGOTIATIONS , 2007, Appl. Artif. Intell..

[4]  Anna Helena Reali Costa,et al.  Experience generalization for multi-agent reinforcement learning , 2001, SCCC 2001. 21st International Conference of the Chilean Computer Science Society.

[5]  Reinaldo A. C. Bianchi,et al.  Accelerating autonomous learning by using heuristic selection of actions , 2008, J. Heuristics.

[6]  Kiril Tenekedjiev,et al.  Simulation and discrete event optimization for automated decisions for in‐queue flights , 2010, Int. J. Intell. Syst..

[7]  Robert James Firby,et al.  Adaptive execution in complex dynamic worlds , 1989 .

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  C. Ribeiro A Tutorial on Reinforcement Learning Techniques , 1999 .

[10]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Bikramjit Banerjee,et al.  Action discovery for reinforcement learning , 2010, AAMAS.

[12]  Martin V. Butz,et al.  Anticipatory Learning Classifier Systems , 2002, Genetic Algorithms and Evolutionary Computation.

[13]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[14]  David Sislák,et al.  Decentralized algorithms for collision avoidance in airspace , 2008, AAMAS.

[15]  Anzac Parade A New Feature For Approximate Dynamic Programming Traffic Light Controller , 2010 .

[16]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[17]  Reinaldo A. C. Bianchi,et al.  Heuristically Accelerated Q-Learning: A New Approach to Speed Up Reinforcement Learning , 2004, SBIA.

[18]  Victor R. Lesser,et al.  Self-organization for coordinating decentralized reinforcement learning , 2010, AAMAS.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Fabrício Enembreck,et al.  Combinando Modelos de Interação para Melhorar a Coordenação em Sistemas Multiagente , 2011, RITA.

[21]  Fabrício Enembreck,et al.  Interaction Models for Multiagent Reinforcement Learning , 2008, 2008 International Conference on Computational Intelligence for Modelling Control & Automation.

[22]  Inés María Galván,et al.  A lazy learning approach for building classification models , 2011, Int. J. Intell. Syst..

[23]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[24]  Chris Drummond,et al.  Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[25]  Doina Precup,et al.  Optimal policy switching algorithms for reinforcement learning , 2010, AAMAS.

[26]  Carlos Cruz Corona,et al.  A study on diversity and cooperation in a multiagent strategy for dynamic optimization problems , 2009, Int. J. Intell. Syst..

[27]  Fabrício Enembreck,et al.  A Hybrid Learning Strategy for Discovery of Policies of Action , 2006, IBERAMIA-SBIA.

[28]  Masoud Mohammadian Multi-Agents Systems for Intelligent Control of Traffic Signals , 2006, 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA'06).

[29]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[30]  Lihong Li,et al.  Reinforcement Learning in Finite MDPs: PAC Analysis , 2009, J. Mach. Learn. Res..