Teaching and leading an ad hoc teammate: Collaboration without pre-coordination

As autonomous agents proliferate in the real world, both in software and robotic settings, they will increasingly need to band together for cooperative activities with previously unfamiliar teammates. In such ad hoc team settings, team strategies cannot be developed a priori. Rather, an agent must be prepared to cooperate with many types of teammates: it must collaborate without pre-coordination. This article defines two aspects of collaboration in two-player teams, involving either simultaneous or sequential decision making. In both cases, the ad hoc agent is more knowledgeable of the environment, and attempts to influence the behavior of its teammate such that they will attain the optimal possible joint utility.

[1]  Sandra Carberry,et al.  Techniques for Plan Recognition , 2001, User Modeling and User-Adapted Interaction.

[2]  A. Schotter,et al.  An Experimental Study of Belief Learning Using Elicited Beliefs , 2002 .

[3]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[4]  Sarit Kraus,et al.  Collaborative Plans for Complex Group Action , 1996, Artif. Intell..

[5]  Masaki Aoyagi,et al.  Mutual Observability and the Convergence of Actions in a Multi-Person Two-Armed Bandit Model , 1998 .

[6]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[7]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[8]  J. S. Albus Task decomposition , 1993, Proceedings of 8th IEEE International Symposium on Intelligent Control.

[9]  Ra Kildare,et al.  Ad-hoc online teams as complex systems: agents that cater for team interaction rules , 2004 .

[10]  Yoav Shoham,et al.  Essentials of Game Theory: A Concise Multidisciplinary Introduction , 2008, Essentials of Game Theory: A Concise Multidisciplinary Introduction.

[11]  Michael H. Bowling,et al.  Coordination and Adaptation in Impromptu Teams , 2005, AAAI.

[12]  Edmund H. Durfee,et al.  Rational Coordination in Multi-Agent Environments , 2000, Autonomous Agents and Multi-Agent Systems.

[13]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[14]  Brett Browning,et al.  Dynamically formed heterogeneous robot teams performing tightly-coordinated tasks , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[15]  Samuel Barrett and Peter Stone Ad Hoc Teamwork Modeled with Multi-armed Bandits: An Extension to Discounted Infinite Rewards , 2011 .

[16]  Michael N. Huhns,et al.  Agents for establishing ad hoc cross-organizational teams , 2004 .

[17]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[18]  Jürgen Eichberger Bayesian Learning in Repeated Normal Form Games , 1995 .

[19]  Long-Ji Lin,et al.  Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[20]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[21]  Michael Wooldridge,et al.  Computational Aspects of Cooperative Game Theory (Synthesis Lectures on Artificial Inetlligence and Machine Learning) , 2011 .

[22]  Godfrey Keller,et al.  Strategic Experimentation with Poisson Bandits , 2009 .

[23]  David C. Parkes,et al.  A General Approach to Environment Design with One Agent , 2009, IJCAI.

[24]  Philip R. Cohen,et al.  Plans for Discourse , 2003 .

[25]  Karen E. Lochbaum,et al.  A Collaborative Planning Model of Intentional Structure , 1998, CL.

[26]  Katia P. Sycara,et al.  Distributed Intelligent Agents , 1996, IEEE Expert.

[27]  Ronen I. Brafman,et al.  On Partially Controlled Multi-Agent Systems , 1996, J. Artif. Intell. Res..

[28]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[29]  Sandra Zilles,et al.  Models of Cooperative Teaching and Learning , 2011, J. Mach. Learn. Res..

[30]  H. Young,et al.  The Evolution of Conventions , 1993 .

[31]  Yoav Shoham,et al.  Learning against opponents with bounded memory , 2005, IJCAI.

[32]  Noa Agmon,et al.  Leading ad hoc agents in joint action settings with multiple teammates , 2012, AAMAS.

[33]  Sarit Kraus,et al.  Learning Teammate Models for Ad Hoc Teamwork , 2012, AAMAS 2012.

[34]  H. Peyton Young,et al.  The Possible and the Impossible in Multi-Agent Learning , 2007, Artif. Intell..

[35]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[36]  Noa Agmon,et al.  Ad hoc teamwork for leading a flock , 2013, AAMAS.

[37]  Gal A. Kaminka,et al.  Integration of Coordination Mechanisms in the BITE Multi-Robot Architecture , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[38]  Candace L. Sidner,et al.  Plan parsing for intended response recognition in discourse 1 , 1985, Comput. Intell..

[39]  Daijiro Okada,et al.  Two-person repeated games with finite automata , 2000, Int. J. Game Theory.

[40]  Michael N. Huhns,et al.  Agents for establishing ad hoc cross-organizational teams , 2004, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)..

[41]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[42]  Karen E. Lochbaum,et al.  An Algorithm for Plan Recognition in Collaborative Discourse , 1991, ACL.

[43]  Ayça Kaya,et al.  When Does it Pay to Get Informed? , 2010 .

[44]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[45]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[46]  Robert J. Aumann,et al.  16. Acceptable Points in General Cooperative n-Person Games , 1959 .

[47]  Peter Stone,et al.  Online Multiagent Learning against Memory Bounded Adversaries , 2008, ECML/PKDD.

[48]  L. Shapley A Value for n-person Games , 1988 .

[49]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[50]  Edmund H. Durfee,et al.  Recursive Agent Modeling Using Limited Rationality , 1995, ICMAS.

[51]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[52]  M. Cripps,et al.  Strategic Experimentation with Exponential Bandits , 2003 .

[53]  Sarit Kraus,et al.  Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[54]  Sarit Kraus,et al.  The Evolution of Sharedplans , 1999 .

[55]  Michael Wooldridge,et al.  Computational Aspects of Cooperative Game Theory , 2011, KES-AMSTA.

[56]  Moshe Tennenholtz,et al.  Adaptive Load Balancing: A Study in Multi-Agent Learning , 1994, J. Artif. Intell. Res..

[57]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[58]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[59]  Ming Li,et al.  Soft Control on Collective Behavior of a Group of Autonomous Agents By a Shill Agent , 2006, J. Syst. Sci. Complex..

[60]  Daniel H. Grollman,et al.  Dogged Learning for Robots , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[61]  Noa Agmon,et al.  Role-Based Ad Hoc Teamwork , 2011, AAAI.

[62]  Edmund H. Durfee,et al.  Blissful Ignorance: Knowing Just Enough to Coordinate Well , 1995, ICMAS.

[63]  Erfu Yang,et al.  Multi-robot systems with agent-based reinforcement learning: evolution, opportunities and challenges , 2009, Int. J. Model. Identif. Control..

[64]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[65]  Peter Stone,et al.  Leading a Best-Response Teammate in an Ad Hoc Team , 2009, AMEC/TADA.

[66]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[67]  Jean Oh,et al.  Electric Elves: Applying Agent Technology to Support Human Organizations , 2001, IAAI.

[68]  Gita Sukthankar,et al.  Toward identifying process models in ad hoc and distributed teams , 2008 .

[69]  Manuela M. Veloso,et al.  Modeling and learning synergy for team formation with heterogeneous agents , 2012, AAMAS.

[70]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[71]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[72]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[73]  H. Peyton Young,et al.  Individual Strategy and Social Structure , 2020 .

[74]  Feng Wu,et al.  Online Planning for Ad Hoc Autonomous Agent Teams , 2011, IJCAI.

[75]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[76]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.