论文信息 - Task distribution through vacancy chains for heterogeneous groups of robots

Task distribution through vacancy chains for heterogeneous groups of robots

Abstract—We present an extension to our adaptive multi-robottask allocation algorithm based on vacancy chains, a resourcedistribution process common in animal and human societies. Thealgorithm uses individual reinforcement learning of task utilitiesand relies on the specializing abilities of the members of thegroup to promote dedicated optimal allocation patterns. Usingrealistic simulation experiments, we evaluate the approach bycomparing greedy and softmax action selection functions for taskallocation. We conclude that using softmax functions makes thevacancy chain algorithm responsive to different levels of abilityin a group of heterogeneous robots as well as to the effects of theunderlying group dynamics such as interference and synergy. I. I NTRODUCTION Existing multi-robot task allocation (MRTA) algorithms [1],[2], [3], [4] are typically not sensitive to the complex effectsof group dynamics, such as interference and synergy. For acooperative task such as transportation or foraging, the averagecompletion time may depend on the number of robots thatare allocated to the same task. Allocating a robot to a taskmay have either a positive or negative effect on a group’sperformance according to how much that robot contributespositively, in accomplishing tasks, or negatively, in increasinginterference. Such dynamics are difﬁcult to model.As a way of circumventing the difﬁculties related to model-ing group dynamics, our past work [5] presented the vacancychain (VC) algorithm. This algorithm is inspired by the VCdistribution process as found in animal and human societies[6]. Each robot in a group following this algorithm uses localreinforcement learning (RL) to estimate the utilities of a set oftasks. From the local utilities and the robots’ action selectionfunctions, emerges the allocation pattern. Experiments in sim-ulation have shown that for groups of homogeneous robots,the VC algorithm promotes optimal system states as deﬁnedby the VC framework.The VC algorithm relies on stigmergy [7], unintentionalcommunication between the robots through their effects onthe environment, to produce specialized individuals for optimalallocation. In this paper we present experimental results show-ing how the ability of the Boltzmann Softmax action selectionfunction to differentiate between suboptimal actions can bemade to work on an inter-robot level, as a mechanism forallocating high-performance robots to high-value tasks withoutcommunication. The VC algorithm can thus be extended towork for groups of heterogeneous robots. Extending the VCalgorithm to cover groups of heterogeneous robots increasesits applicability. This is important because the VC algorithm,unlike existing MRTA algorithms, is sensitive to the effectsof group dynamics and provides a way of optimizing theperformance of groups of cooperative robots in domains wherethese effects are signiﬁcant.II. M

Gaurav S. Sukhatme | Torbjørn S. Dahl

[1] Rachid Alami,et al. M+: a scheme for multi-robot cooperation through negotiated task allocation and achievement , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[2] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[3] Maja J. Mataric,et al. Learning Multiple Models for Reward Maximization , 2000, ICML.

[4] Gaurav S. Sukhatme,et al. Scheduling with Group Dynamics: A Multi-Robot Task-Allocation Algorithm based on Vacancy Chains , 2002 .

[5] Maja J. Mataric,et al. Interaction and intelligent behavior , 1994 .

[6] Tucker Balch,et al. Reward and Diversity in Multirobot Foraging , 1999, IJCAI 1999.

[7] Manuela M. Veloso,et al. Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[8] A. Ijspeert,et al. A Macroscopic Analytical Model of Collaboration in Distributed Robotic Systems , 2002, Artificial Life.

[9] Tucker R. Balch. The impact of diversity on performance in multi-robot foraging , 1999, AGENTS '99.

[10] Chris Melhuish,et al. Stigmergy, Self-Organization, and Sorting in Collective Robotics , 1999, Artificial Life.

[11] Manuela M. Veloso,et al. Automatically tracking and analyzing the behavior of live insect colonies , 2001, AGENTS '01.

[12] Roger B. Myerson,et al. Game theory - Analysis of Conflict , 1991 .

[13] Anil K. Seth,et al. Modeling Group Foraging: Individual Suboptimality, Interference, and a Kind of Matching , 2001, Adapt. Behav..

[14] Maja J. Mataric,et al. Behaviour-based control: examples from navigation, learning, and group behaviour , 1997, J. Exp. Theor. Artif. Intell..

[15] Maja J. Mataric,et al. Sold!: auction methods for multirobot coordination , 2002, IEEE Trans. Robotics Autom..

[16] Wilfried Brauer,et al. Multi-machine scheduling-a multi-agent learning approach , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[17] Michael Sampels,et al. Ant colony optimization for FOP shop scheduling: a case study on different pheromone representations , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[18] I. Chase,et al. The vacancy chain process: a new mechanism of resource distribution in animals with application to hermit crabs , 1988, Animal Behaviour.