论文信息 - Negotiating team formation using deep reinforcement learning

Negotiating team formation using deep reinforcement learning

Abstract When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotiation have been proposed, but typically only work for particular negotiation protocols. More general methods usually require human input or domain-specific data, and so do not scale. To address this, we propose a framework for training agents to negotiate and form teams using deep reinforcement learning. Importantly, our method makes no assumptions about the specific negotiation protocol, and is instead completely experience driven. We evaluate our approach on both non-spatial and spatially extended team-formation negotiation environments, demonstrating that our agents beat hand-crafted bots and reach negotiation outcomes consistent with fair solutions predicted by cooperative game theory. Additionally, we investigate how the physical location of agents influences negotiation outcomes.

[1] D. Leech. Designing the Voting System for the Council of the European Union , 2002 .

[2] Juliane Hahn,et al. Security And Game Theory Algorithms Deployed Systems Lessons Learned , 2016 .

[3] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[4] David W. Johnson,et al. Cooperation and Competition: Theory and Research , 1989 .

[5] Sarit Kraus,et al. Can automated agents proficiently negotiate with humans? , 2010, CACM.

[6] V. Stojanovic,et al. Robust identification of pneumatic servo actuators in the real situations , 2011 .

[7] Yann Dauphin,et al. Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[8] Edith Elkind,et al. False-Name Manipulations in Weighted Voting Games , 2014, J. Artif. Intell. Res..

[9] Tom Eccles,et al. Biases for Emergent Communication in Multi-agent Reinforcement Learning , 2019, NeurIPS.

[10] Frank Hutter,et al. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.

[11] Vladimir Stojanovic,et al. Joint state and parameter robust estimation of stochastic nonlinear systems , 2016 .

[12] Hiroaki Kitano,et al. RoboCup Rescue: search and rescue in large-scale disasters as a domain for autonomous agents research , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[13] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[14] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[15] René van den Brink. Efficiency and collusion neutrality in cooperative games and networks , 2012, Games Econ. Behav..

[16] Sarit Kraus,et al. Methods for Task Allocation via Agent Coalition Formation , 1998, Artif. Intell..

[17] Hiromitsu Hattori,et al. Multi-issue Negotiation Protocol for Agents: Exploring Nonlinear Utility Spaces , 2007, IJCAI.

[18] Gregory R. Madey,et al. Verification and Validation of Agent-based Scientific Simulation Models , 2005 .

[19] R. Aumann,et al. THE BARGAINING SET FOR COOPERATIVE GAMES , 1961 .

[20] Catholijn M. Jonker,et al. The Fifth Automated Negotiating Agents Competition (ANAC 2014) , 2016, ANAC@AAMAS.

[21] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[22] L. Shapley,et al. The Shapley Value , 1994 .

[23] Edith Elkind,et al. Cooperative Game Theory , 2016, Economics and Computation.

[24] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[25] Martin Shubik,et al. A Method for Evaluating the Distribution of Power in a Committee System , 1954, American Political Science Review.

[26] Vladimir Stojanovic,et al. A Nature Inspired Parameter Tuning Approach to Cascade Control for Hydraulically Driven Parallel Robot Platform , 2016, J. Optim. Theory Appl..

[27] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[28] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[29] D. Felsenthal,et al. The Measurement of Voting Power: Theory and Practice, Problems and Paradoxes , 1998 .

[30] Bo An,et al. Automated negotiation with decommitment for dynamic resource allocation in cloud computing , 2010, AAMAS.

[31] Pushmeet Kohli,et al. Rip-off: playing the cooperative negotiation game , 2011, AAMAS.

[32] Jeffrey Zwiebel,et al. Block Investment and Partial Benefits of Corporate Control , 1995 .

[33] Rachna,et al. Sapiens: A brief history of humankind , 2017 .

[34] Gregory R. Madey,et al. Tools of the Trade: A Survey of Various Agent Based Modeling Platforms , 2009, J. Artif. Soc. Soc. Simul..

[35] Michael P. Wellman,et al. Stochastic Search Methods for Nash Equilibrium Approximation in Simulation-based Games , 2022 .

[36] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[37] Yoram Bachrach,et al. Rebuilding the Great Pyramids: A Method for Identifying Control Relations in Complex Ownership Structures , 2011 .

[38] Yoram Bachrach,et al. Reliability Weighted Voting Games , 2013, SAGT.

[39] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[40] Jeffrey S. Rosenschein,et al. Bitcoin Mining Pools: A Cooperative Game Theoretic Analysis , 2015, AAMAS.

[41] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[42] Michael Wooldridge,et al. Introduction to multiagent systems , 2001 .

[43] Forming Coalitions and Measuring Voting Power , 1982 .

[44] Moshe Tennenholtz,et al. Solving Cooperative Reliability Games , 2011, UAI.

[45] Gjergji Kasneci,et al. Crowd IQ: aggregating opinions to boost performance , 2012, AAMAS.

[46] Rahul Savani,et al. Power Indices in Spanning Connectivity Games , 2009, AAIM.

[47] David C. Parkes,et al. Computing cooperative solution concepts in coalitional skill games , 2013, Artif. Intell..

[48] Zhu Han,et al. Cooperative Game Theory for Distributed Spectrum Sharing , 2007, 2007 IEEE International Conference on Communications.

[49] Jeff S. Shamma,et al. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.

[50] Behnam Mohammadi-Ivatloo,et al. Game Theory Approaches for the Solution of Power System Problems: A Comprehensive Review , 2018, Archives of Computational Methods in Engineering.

[51] Stephen Clark,et al. Emergent Communication through Negotiation , 2018, ICLR.

[52] Eric Maskin,et al. Implementation and strong Nash equilibrium , 1978 .

[53] Joel Z. Leibo,et al. A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[54] Yishay Mansour,et al. Strong price of anarchy , 2007, SODA '07.

[55] Guy Lever,et al. The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution , 2019, AAMAS.

[56] Ely Porat,et al. Power and stability in connectivity games , 2008, AAMAS.

[57] Manuela Veloso,et al. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning , 2000 .

[58] P. Dubey. On the uniqueness of the Shapley value , 1975 .

[59] George J. Mailath,et al. Collusion in second price auctions with heterogeneous bidders , 1991 .

[60] T. Koopmans,et al. Activity Analysis of Production and Allocation. , 1952 .

[61] D. Schmeidler. The Nucleolus of a Characteristic Function Game , 1969 .

[62] Samuel Bowles,et al. The Origins of Human Cooperation , 2002 .

[63] S. Kirby,et al. Self domestication and the evolution of language , 2018, Biology & philosophy.

[64] M. Christopher. Logistics & Supply Chain Management , 1998 .

[65] J. Geanakoplos,et al. From Nash to Walras Via Shapley-Shubik , 2002 .

[66] Piotr Faliszewski,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Manipulating the Quota in Weighted Voting Games , 2022 .

[67] Jeffrey S. Rosenschein,et al. The Cost of Stability in Network Flow Games , 2009, MFCS.

[68] A. Greif. Institutions and the Path to the Modern Economy: Lessons from Medieval Trade , 2006 .

[69] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[70] Danny Weyns,et al. Multi-Agent Systems , 2009 .

[71] Jeffrey S. Rosenschein,et al. Power in threshold network flow games , 2009, Autonomous Agents and Multi-Agent Systems.

[72] Vladimir Stojanovic,et al. A nature inspired optimal control of pneumatic-driven parallel robot platform , 2017 .

[73] R. Holzman,et al. Strong Equilibrium in Congestion Games , 1997 .

[74] S. Hart,et al. Handbook of game theory with economic applications: volume 3 , 1993 .

[75] C. D. De Dreu. Human Cooperation , 2013, Psychological science in the public interest : a journal of the American Psychological Society.

[76] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[77] Eduardo Alonso Fernández,et al. Rules of encounter: designing conventions for automated negotiation among computers , 1995 .

[78] Pradeep Dubey,et al. Mathematical Properties of the Banzhaf Power Index , 1979, Math. Oper. Res..

[79] Michael Wooldridge,et al. On the computational complexity of weighted voting games , 2009, Annals of Mathematics and Artificial Intelligence.

[80] Kallirroi Georgila,et al. Reinforcement Learning of Argumentation Dialogue Policies in Negotiation , 2011, INTERSPEECH.

[81] Yoav Shoham,et al. Marginal contribution nets: a compact representation scheme for coalitional games , 2005, EC '05.

[82] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[83] Ariel Rubinstein,et al. A Course in Game Theory , 1995 .

[84] S. Komorita,et al. Interpersonal Relations: Mixed-Motive Interaction , 1995 .

[85] Sarit Kraus,et al. Multi-robot perimeter patrol in adversarial settings , 2008, 2008 IEEE International Conference on Robotics and Automation.

[86] R. Aumann. Values of Markets with a Continuum of Traders , 1975 .

[87] Kalliopi Kravari,et al. A Survey of Agent Platforms , 2015, J. Artif. Soc. Soc. Simul..

[88] J. Henrich. The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter , 2015 .

[89] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[90] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[91] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[92] Sarit Kraus,et al. The First Automated Negotiating Agents Competition (ANAC 2010) , 2012, New Trends in Agent-Based Complex Automated Negotiations.

[93] Sarit Kraus,et al. Negotiation and Cooperation in Multi-Agent Environments , 1997, Artif. Intell..

[94] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[95] Ariel D. Procaccia,et al. On the Structure of Synergies in Cooperative Games , 2014, AAAI.

[96] Michael Wooldridge,et al. Computational Aspects of Cooperative Game Theory , 2011, KES-AMSTA.

[97] Xiaotie Deng,et al. Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[98] Dirk Lindebaum. Sapiens: A Brief History of Humankind - A Review , 2015 .

[99] A. Roth,et al. The Shapley—Shubik and Banzhaf power indices as probabilities , 1988 .

[100] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[101] Morteza Zadimoghaddam,et al. A cooperative approach to collusion in auctions , 2011, SECO.

[102] Ely Porat,et al. Path disruption games , 2010, AAMAS.

[103] Vincent Conitzer,et al. Computing Shapley Values, Manipulating Value Division Schemes, and Checking Core Membership in Multi-Issue Domains , 2004, AAAI.

[104] Shui Yu,et al. A Dynamic Pricing Method for Carpooling Service Based on Coalitional Game Analysis , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[105] L. Shapley. Values of Large Games - VII: A General Exchange Economy with Money , 1964 .

[106] N. R. Jennings,et al. To appear in: Int Journal of Group Decision and Negotiation GDN2000 Keynote Paper Automated Negotiation: Prospects, Methods and Challenges , 2022 .

[107] Omer Lev,et al. Mergers and collusion in all-pay auctions and crowdsourcing contests , 2013, AAMAS.

[108] Sarit Kraus,et al. Negotiating with bounded rational agents in environments with incomplete information using an automated agent , 2008, Artif. Intell..

[109] Janez Brest,et al. A Brief Review of Nature-Inspired Algorithms for Optimization , 2013, ArXiv.

[110] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[111] Ben-Gurion. How to Form Winning Coalitions in Mixed Human-Computer Settings , 2017 .

[112] Jim R. Oliver. A Machine-Learning Approach to Automated Negotiation and Prospects for Electronic Commerce , 1996, J. Manag. Inf. Syst..

[113] Dan S. Felsenthal,et al. The measurement of voting power , 1998 .

[114] Xin-She Yang,et al. Nature-Inspired Metaheuristic Algorithms , 2008 .

[115] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[116] Samir Aknine,et al. An Extended Multi-Agent Negotiation Protocol , 2004, Autonomous Agents and Multi-Agent Systems.

[117] Sarvapali D. Ramchurn,et al. Competing with Humans at Fantasy Football: Team Formation in Large Partially-Observable Domains , 2012, AAAI.

[118] Koen V. Hindriks,et al. The first automated negotiating agents competition (ANAC 2010) , 2016 .

[119] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[120] Yoram Bachrach,et al. Honor among thieves: collusion in multi-unit auctions , 2010, AAMAS.

[121] Gideon Blocq,et al. The shared assignment game and applications to pricing in cloud computing , 2014, AAMAS.

[122] L. Shapley,et al. Values of Non-Atomic Games , 1974 .

[123] Jeffrey S. Rosenschein,et al. Negotiation and Task Sharing Among Autonomous Agents in Cooperative Domains , 1989, IJCAI.

[124] Vladimir Stojanovic,et al. Optimal cascade hydraulic control for a parallel robot platform by PSO , 2014 .

[125] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[126] Jeffrey S. Rosenschein,et al. Minimal Subsidies in Expense Sharing Games , 2010, SAGT.

[127] Craig Boutilier,et al. Bayesian reinforcement learning for coalition formation under uncertainty , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[128] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[129] Felix Brandt,et al. Monotone cooperative games and their threshold versions , 2010, AAMAS.

[130] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[131] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[132] Tom Eccles,et al. Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games , 2020, AAMAS.

[133] L. S. Shapley,et al. 17. A Value for n-Person Games , 1953 .

[134] R. Kohli,et al. A cooperative game theory model of quantity discounts , 1989 .

[135] Stef Tijs. Cooperative game theory and or-models , 2000 .

[136] S. Hart,et al. Handbook of Game Theory with Economic Applications , 1992 .

[137] Michael Wooldridge,et al. On the Formal Specification and Verification of Multi-Agent Systems , 1997, Int. J. Cooperative Inf. Syst..

[138] Faruk Gul. Bargaining Foundations of Shapley Value , 1989 .

[139] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[140] Geoffrey M. Hodgson,et al. Genetic and Cultural Evolution of Cooperation , 2005 .

[141] William H. Sandholm,et al. ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[142] Vladimir Stojanovic,et al. Optimal experiment design for identification of ARX models with constrained output in non-Gaussian noise , 2016 .

[143] Éva Tardos,et al. Strong Price of Anarchy, Utility Games and Coalitional Dynamics , 2014, SAGT.

[144] Jacques Ferber,et al. Multi-agent systems - an introduction to distributed artificial intelligence , 1999 .

[145] R. Aumann. The core of a cooperative game without side payments , 1961 .