Adaptive action selection using utility-based reinforcement learning

A basic problem of intelligent systems is choosing adaptive action to perform in a non-stationary environment. Due to the combinatorial complexity of actions, agent cannot possibly consider every option available to it at every instant in time. It needs to find good policies that dictate optimum actions to perform in each situation. This paper proposes an algorithm, called UQ-learning, to better solve action selection problem by using reinforcement learning and utility function. Reinforcement learning can provide the information of environment and utility function is used to balance Exploration-Exploitation dilemma. We implement our method with maze navigation tasks in a non-stationary environment. The results of simulated experiments show that utility-based reinforcement learning approach is more effective and efficient compared with Q-learning and Recency-Based Exploration.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  Wei Pan,et al.  The Two Facets of the Exploration-Exploitation Dilemma , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[3]  T. Prescott,et al.  Introduction. Modelling natural action selection , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4]  Maja J. Matarić,et al.  Action Selection methods using Reinforcement Learning , 1996 .

[5]  Peter Stone,et al.  Improving Action Selection in MDP's via Knowledge Transfer , 2005, AAAI.

[6]  Ladan Tahvildari,et al.  Adaptive Action Selection in Autonomic Software Using Reinforcement Learning , 2008, Fourth International Conference on Autonomic and Autonomous Systems (ICAS'08).

[7]  Shimon Whiteson,et al.  Empirical Studies in Action Selection with Reinforcement Learning , 2007, Adapt. Behav..

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[10]  Reinaldo A. C. Bianchi,et al.  Heuristic Selection of Actions in Multiagent Reinforcement Learning , 2007, IJCAI.