Allocating training instances to learning agents for team formation

Agents can learn to improve their coordination with their teammates and increase team performance. There are finite training instances, where each training instance is an opportunity for the learning agents to improve their coordination. In this article, we focus on allocating training instances to learning agent pairs, i.e., pairs that improve coordination with each other, with the goal of team formation. Agents learn at different rates, and hence, the allocation of training instances affects the performance of the team formed. We build upon previous work on the Synergy Graph model, that is learned completely from data and represents agents’ capabilities and compatibility in a multi-agent team. We formally define the learning agents team formation problem, and compare it with the multi-armed bandit problem. We consider learning agent pairs that improve linearly and geometrically, i.e., the marginal improvement decreases by a constant factor. We contribute algorithms that allocate the training instances, and compare against algorithms from the multi-armed bandit problem. In our simulations, we demonstrate that our algorithms perform similarly to the bandit algorithms in the linear case, and outperform them in the geometric case. Further, we apply our model and algorithms to a multi-agent foraging problem, thus demonstrating the efficacy of our algorithms in general multi-agent problems.

[1]  Manuela M. Veloso,et al.  Forming an effective multi-robot team robust to failures , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Noa Agmon,et al.  Leading ad hoc agents in joint action settings with multiple teammates , 2012, AAMAS.

[3]  Gary B. Parker,et al.  Co-Evolving Team Capture Strategies for Dissimilar Robots , 2004, AAAI Technical Report.

[4]  Patrice Caire,et al.  Computing coalitions in Multiagent Systems: A contextual reasoning approach , 2015 .

[5]  Keng Peng Tee,et al.  Multi-Robot Item Delivery and Foraging: Two Sides of a Coin , 2015, Robotics.

[6]  Manuela M. Veloso,et al.  Weighted synergy graphs for role assignment in ad hoc heterogeneous robot teams , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Francisco S. Melo,et al.  Ad hoc teamwork by learning teammates’ task , 2015, Autonomous Agents and Multi-Agent Systems.

[8]  Noa Agmon,et al.  Leading Multiple Ad Hoc Teammates in Joint Action Settings , 2011, Interactive Decision Theory and Game Theory.

[9]  Sarit Kraus,et al.  To teach or not to teach?: decision making under uncertainty in ad hoc teams , 2010, AAMAS.

[10]  Somchaya Liemhetcharat,et al.  Representation, Planning, and Learning of Dynamic Ad Hoc Robot Teams , 2013 .

[11]  Samuel Barrett and Peter Stone Cooperating with Unknown Teammates in Robot Soccer , 2014, AAAI 2014.

[12]  Ann Nowé,et al.  Evolutionary game theory and multi-agent reinforcement learning , 2005, The Knowledge Engineering Review.

[13]  Kian Hsiang Low,et al.  Multi-Agent Continuous Transportation with Online Balanced Partitioning: (Extended Abstract) , 2015, AAMAS.

[14]  Lakhmi C. Jain,et al.  Innovations in Multi-Agent Systems and Applications - 1 , 2010 .

[15]  Leen-Kiat Soh,et al.  Strategic Capability-Learning for Improved Multiagent Collaboration in Ad Hoc Environments , 2012, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Keng Peng Tee,et al.  Continuous Foraging and Information Gathering in a Multi-Agent Team , 2015, AAMAS.

[17]  Sarit Kraus,et al.  Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[18]  Peter Stone,et al.  Online Multiagent Learning against Memory Bounded Adversaries , 2008, ECML/PKDD.

[19]  Peter Stone,et al.  Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.

[20]  Feng Wu,et al.  Online Planning for Ad Hoc Autonomous Agent Teams , 2011, IJCAI.

[21]  Peter Stone,et al.  Cooperating with a markovian ad hoc teammate , 2013, AAMAS.

[22]  Jacob W. Crandall Non-Myopic Learning in Repeated Stochastic Games , 2014, ArXiv.

[23]  Manuela M. Veloso,et al.  Learning the synergy of a new teammate , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Francisco S. Melo,et al.  Ad Hoc Teamwork by Learning Teammates’ Task (JAAMAS Extended Abstract) , 2016 .

[25]  Sandip Sen,et al.  Shared memory based cooperative coevolution , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[26]  Leen-Kiat Soh,et al.  To Ask, Sense, or Share: Ad Hoc Information Gathering , 2015, AAMAS.

[27]  Leslie Pack Kaelbling,et al.  All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.

[28]  Sandip Sen,et al.  Evolving Cooperative Groups: Preliminary Results , 1997 .

[29]  Aijun An,et al.  Two-Phase Pareto Set Discovery for Team Formation in Social Networks , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[30]  Subramanian Ramamoorthy,et al.  A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems , 2013, AAMAS.

[31]  Somchaya Liemhetcharat Adversarial Synergy Graph Model for Predicting Game Outcomes in Human Basketball , 2015 .

[32]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[33]  Samuel Barrett,et al.  Making Friends on the Fly: Advances in Ad Hoc Teamwork , 2015, Studies in Computational Intelligence.

[34]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[35]  Somchaya Liemhetcharat,et al.  Applying the Synergy Graph Model to Human Basketball , 2015, AAMAS.

[36]  Manuela M. Veloso,et al.  Modeling and learning synergy for team formation with heterogeneous agents , 2012, AAMAS.

[37]  Sarit Kraus,et al.  Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[38]  Doran Chakraborty,et al.  Sample Efficient Multiagent Learning in the Presence of Markovian Agents , 2013, Studies in Computational Intelligence.

[39]  Manuela M. Veloso,et al.  Synergy graphs for configuring robot team members , 2013, AAMAS.

[40]  Noa Agmon,et al.  Ad hoc teamwork for leading a flock , 2013, AAMAS.

[41]  Carmel Domshlak,et al.  Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..

[42]  Subramanian Ramamoorthy,et al.  Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems , 2012, AAMAS.

[43]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[44]  Manuela M. Veloso,et al.  Weighted synergy graphs for effective team formation with heterogeneous ad hoc agents , 2014, Artif. Intell..

[45]  Sarit Kraus,et al.  Learning Teammate Models for Ad Hoc Teamwork , 2012, AAMAS 2012.

[46]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[47]  Patrice Caire,et al.  A MCS-based methodology for computing coalitions in Multirobot Systems , 2014 .