Adaptive Combination of Behaviors in an Agent

Agents are of interest mainly when confronted with complex tasks. We propose a methodology for the automated design of such agents (in the framework of Markov Decision Processes) in the case where the global task can be decomposed into simpler -possibly concurrent- sub-tasks. This is accomplished by automatically combining basic behaviors using Reinforcement Learning methods. The main idea is to build a global policy using a weighted combination of basic policies, the weights being learned by the agent (using Simulated Annealing in our case). These basic behaviors can either be learned or reused from previous tasks since they will not need to be tuned to the new task. Furthermore, the agents designed by our methodology are highly scalable as, without further refinement of the global behavior, they can automatically combine several instances of the same basic behavior to take into account concurrent occurences of the same subtask.

[1]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[2]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[3]  K. R. Dixon,et al.  Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents , 2000 .

[4]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[5]  Brian Sallans,et al.  Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.

[6]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[7]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[8]  Olivier Buffet,et al.  Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.

[9]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[10]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  Gang Wang,et al.  Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[15]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.