Reinforcement learning in swarm-robotics for multi-agent foraging-task domain

The main focus of this paper is to study and develop an efficient learning policy to address the exploration vs. exploitation dilemma in a multi-objective foraging task in swarm robotics domain. An efficient learning policy called FIFO-list is proposed to tackle the above mentioned problem. The proposed FIFO-list is a model-based learning policy which can attain near-optimal solutions. In FIFO-list, the swarm robots maintains a dynamic list of recently visited states. States that are included in the list are banned from exploration by the swarm robots regardless of the Q(s, a) values associated with those states. The FIFO list is updated based on First-In-First-Out (FIFO) rule, meaning the states that enters the list first will exit the list first. The recently visited states will remain in the list for a dynamic number of time-steps which is determined by the desirability of the visited states.

[1]  Alexander Zelinsky,et al.  Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.

[2]  Eliseo Ferrante,et al.  Swarm robotics: a review from the swarm engineering perspective , 2013, Swarm Intelligence.

[3]  Tucker R. Balch The impact of diversity on performance in multi-robot foraging , 1999, AGENTS '99.

[4]  Dirk Thierens,et al.  An Adaptive Pursuit Strategy for Allocating Operator Probabilities , 2005, BNAIC.

[5]  Yang Liu,et al.  A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  A. S. Xanthopoulos,et al.  Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems , 2008, Appl. Math. Comput..

[7]  Alex Fukunaga,et al.  Cooperative mobile robotics: antecedents and directions , 1995 .

[8]  Oliver Kroemer,et al.  Active exploration for robot parameter selection in episodic reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[9]  Ying Wang,et al.  A Hybrid Visual Servo Controller for Robust Grasping by Wheeled Mobile Robots , 2010, IEEE/ASME Transactions on Mechatronics.

[10]  David Carmel,et al.  Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies , 1999, Autonomous Agents and Multi-Agent Systems.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Mohammad A. Jaradat,et al.  Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[13]  Yang Hyun-Chang,et al.  Behavior learning and evolution of swarm robot system for cooperative behavior , 2009, 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics.

[14]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.