Learning Sharing Behaviors with Arbitrary Numbers of Agents

We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs) with weights dynamically updated during interactions, and the multi-agent fusion model is a logistic regression classifier. We test our models in a multi-agent tower-building environment, where a Q-learning agent learns to interact with rule-based agents. Our approach accurately models the underlying behavior patterns of the rule-based agents with accuracy ranging between 0.63 and 1.0 depending on the stochasticity of the other agent behaviors. In addition we show using KL-divergence that the model accurately captures the distribution of next actions when interacting with both a single agent (KL-divergence

[1]  Russell Y. Webb,et al.  Rewards for pairs of Q-learning agents conducive to turn-taking in medium-access games , 2012, Adapt. Behav..

[2]  Catherine Pelachaud,et al.  The Effects of Interrupting Behavior on Interpersonal Attitude and Engagement in Dyadic Interactions , 2016, AAMAS.

[3]  Russell Y. Webb,et al.  A simple metric for turn-taking in emergent communication , 2012, Adapt. Behav..

[4]  Minjie Zhang,et al.  Emergence of social norms through collective learning in networked agent societies , 2013, AAMAS.

[5]  Joshua M. Epstein,et al.  Learning to Be Thoughtless: Social Norms and Individual Computation , 2001 .

[6]  E. Schegloff Sequence Organization in Interaction: Contents , 2007 .

[7]  Francisco S. Melo,et al.  Ad hoc teamwork by learning teammates’ task , 2015, Autonomous Agents and Multi-Agent Systems.

[8]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[10]  Yingke Chen,et al.  Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams , 2014, AAMAS.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Martin Havlík,et al.  Emanuel A. Schegloff: Sequence Organization in Interaction. Volume 1. A Primer in Conversation Analysis , 2010 .

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  G. Beattie Interruption in conversational interaction, and its relation to the sex and status of the interactants* , 1981 .

[15]  Bastin Tony Roy Savarimuthu,et al.  Norm learning in multi-agent societies , 2011 .

[16]  Sandip Sen,et al.  Emergence of Norms through Social Learning , 2007, IJCAI.

[17]  George A. Vouros,et al.  Learning Conventions via Social Reinforcement Learning in Complex and Open Settings , 2017, AAMAS.

[18]  Jordi Delgado,et al.  Emergence of social conventions in complex networks , 2002, Artif. Intell..

[19]  Emanuel A. Schegloff,et al.  Accounts of Conduct in Interaction: Interruption, Overlap, and Turn-Taking , 2001 .

[20]  Francisco S. Melo,et al.  Ad Hoc Teamwork by Learning Teammates’ Task (JAAMAS Extended Abstract) , 2016 .

[21]  Maxine Eskénazi,et al.  A multi-layer architecture for semi-synchronous event-driven dialogue management , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[22]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[23]  Jacob W. Crandall,et al.  Belief and Truth in Hypothesised Behaviours , 2015, Artif. Intell..

[24]  M EpsteinJoshua Learning to Be Thoughtless , 2001 .

[25]  Sandip Sen,et al.  Norm emergence under constrained interactions in diverse societies , 2008, AAMAS.

[26]  Sarit Kraus,et al.  Empirical evaluation of ad hoc teamwork in the pursuit domain , 2011, AAMAS.

[27]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[28]  Subramanian Ramamoorthy,et al.  On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems , 2014, UAI.

[29]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[30]  Peter Stone,et al.  Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.

[31]  Peter Stone,et al.  Reasoning about Hypothetical Agent Behaviours and their Parameters , 2017, AAMAS.