To teach or not to teach?: decision making under uncertainty in ad hoc teams

In typical multiagent teamwork settings, the teammates are either programmed together, or are otherwise provided with standard communication languages and coordination protocols. In contrast, this paper presents an ad hoc team setting in which the teammates are not pre-coordinated, yet still must work together in order to achieve their common goal(s). We represent a specific instance of this scenario, in which a teammate has limited action capabilities and a fixed and known behavior, as a finite-horizon, cooperative k-armed bandit. In addition to motivating and studying this novel ad hoc teamwork scenario, the paper contributes to the k-armed bandits literature by characterizing the conditions under which certain actions are potentially optimal, and by presenting a polynomial dynamic programming algorithm that solves for the optimal action when the arm payoffs come from a discrete distribution.

[1]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[2]  David C. Parkes,et al.  A General Approach to Environment Design with One Agent , 2009, IJCAI.

[3]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[4]  Ayça Kaya,et al.  When Does it Pay to Get Informed? , 2010 .

[5]  Michael N. Huhns,et al.  Agents for establishing ad hoc cross-organizational teams , 2004, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)..

[6]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[7]  Ronen I. Brafman,et al.  On Partially Controlled Multi-Agent Systems , 1996, J. Artif. Intell. Res..

[8]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  M. Cripps,et al.  Strategic Experimentation with Exponential Bandits , 2003 .

[13]  Jean Oh,et al.  Electric Elves: Applying Agent Technology to Support Human Organizations , 2001, IAAI.

[14]  Gita Sukthankar,et al.  Toward identifying process models in ad hoc and distributed teams , 2008 .

[15]  Ra Kildare,et al.  Ad-hoc online teams as complex systems: agents that cater for team interaction rules , 2004 .

[16]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[17]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[18]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[19]  Katia P. Sycara,et al.  Distributed Intelligent Agents , 1996, IEEE Expert.