Communicating with Unknown Teammates

Past research has investigated a number of methods for coordinating teams of agents, but, with the growing number of sources of agents, it is likely that agents will encounter teammates that do not share their coordination methods. Therefore, it is desirable for agents to form an effective ad hoc team. This research tackles the problem of communication in ad hoc teams, introducing a minimal version of the multiagent, multi-armed bandit problem with limited communication between the agents. This abstract summarizes theoretical results that prove that this problem setting can be solved in polynomial time when the agent knows the set of possible teammates, and the empirical results that show that the problems can be solved in practice.

[1]  Brett Browning,et al.  Dynamically formed heterogeneous robot teams performing tightly-coordinated tasks , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[2]  Amos Azaria,et al.  Combining psychological models with machine learning to better predict people’s decisions , 2012, Synthese.

[3]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[4]  Manuela M. Veloso,et al.  Modeling mutual capabilities in heterogeneous teams for role assignment , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Sarit Kraus,et al.  The Evolution of Sharedplans , 1999 .

[6]  Kagan Tumer,et al.  Robot coordination with ad-hoc team formation , 2010, AAMAS.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Sarit Kraus,et al.  Teamwork with Limited Knowledge of Teammates , 2013, AAAI.

[9]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[10]  Claudia V. Goldman,et al.  Learning to communicate in a decentralized environment , 2007, Autonomous Agents and Multi-Agent Systems.

[11]  Ming Li,et al.  Soft Control on Collective Behavior of a Group of Autonomous Agents By a Shill Agent , 2006, J. Syst. Sci. Complex..

[12]  Feng Wu,et al.  Online Planning for Ad Hoc Autonomous Agent Teams , 2011, IJCAI.

[13]  Yifeng Zeng,et al.  Improved approximation of interactive dynamic influence diagrams using discriminative model updates , 2009, AAMAS.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[16]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[17]  David Danks,et al.  Wisdom of crowds versus groupthink: learning in groups and in isolation , 2013, Int. J. Game Theory.

[18]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[19]  Peter Stone,et al.  An analysis framework for ad hoc teamwork tasks , 2012, AAMAS.

[20]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[21]  Sarit Kraus,et al.  To teach or not to teach?: decision making under uncertainty in ad hoc teams , 2010, AAMAS.

[22]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[23]  Sarit Kraus,et al.  Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[24]  Emma Brunskill,et al.  Bayes-optimal reinforcement learning for discrete uncertainty domains , 2012, AAMAS.

[25]  Michael H. Bowling,et al.  Coordination and Adaptation in Impromptu Teams , 2005, AAAI.

[26]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[27]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[28]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[29]  David Carmel,et al.  Incorporating Opponent Models into Adversary Search , 1996, AAAI/IAAI, Vol. 1.