Rewards for pairs of Q-learning agents conducive to turn-taking in medium-access games

We describe a class of stateful games, which we call ‘medium-access games’, as a model for human and machine communication and demonstrate how to use the Nash equilibria of those games as played by pairs of agents with stationary policies to predict turn-taking behaviour in Q-learning agents based on the agents’ reward function. We identify which fixed policies exhibit turn-taking behaviour in medium-access games and show how to compute the Nash equilibria of such games by using Markov chain methods to calculate the agents’ expected rewards for different stationary policies. We present simulation results for an extensive range of reward functions for pairs of Q-learners playing medium-access games and we use our analysis for stationary agents to develop predictors for the emergence of turn-taking. We explain how to use our predictors to design reward functions for pairs of Q-learning agents that are conducive (or prohibitive) to the emergence of turn-taking in medium-access games. We focus on designing multi-agent reinforcement learning systems that deliberately produce coordinated turn-taking but we also intend our results to be useful for analysing emergent turn-taking behaviour. Based on our turn-taking related results, we suggest ways to use our methodology to designs rewards for quantifiable behaviours besides turn-taking.

[1]  Lijun Chen,et al.  Random Access Game and Medium Access Control Design , 2010, IEEE/ACM Transactions on Networking.

[2]  Stefano Nolfi,et al.  Emergence of communication in embodied agents: co-adapting communicative and non-communicative behaviours , 2005, Connect. Sci..

[3]  Claudia V. Goldman,et al.  Learning to communicate in a decentralized environment , 2007, Autonomous Agents and Multi-Agent Systems.

[4]  Daniel B. Neill Cooperation and coordination in the turn-taking dilemma , 2003, TARK '03.

[5]  Markku Turunen,et al.  Evaluation of a spoken dialogue system with usability tests and long-term pilot studies: similarities and differences , 2006, INTERSPEECH.

[6]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[7]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[8]  E. Larsson,et al.  Game theory and the flat-fading gaussian interference channel , 2009, IEEE Signal Processing Magazine.

[9]  Ezequiel A. Di Paolo,et al.  Behavioral Coordination, Structural Congruence and Entrainment in a Simulation of Acoustically Coupled Agents , 2000, Adapt. Behav..

[10]  Olle Häggström Finite Markov Chains and Algorithmic Applications , 2002 .

[11]  Andrew M. Colman,et al.  Evolution of cooperative turn-taking , 2009 .

[12]  O. Rasa,et al.  The costs and effectiveness of vigilance behaviour in the Dwarf Mongoose: implications for fitness and optimal group size , 1989 .

[13]  Rufus A. Johnstone,et al.  Pairs of Fish Resolve Conflicts over Coordinated Movement by Taking Turns , 2010, Current Biology.

[14]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[15]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  James A. Reggia,et al.  Progress in the Simulation of Emergent Communication and Language , 2003, Adapt. Behav..

[17]  Sjur Didrik Flåm,et al.  Equilibrium, Evolutionary stability and Gradient Dynamics , 2002, IGTR.

[18]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Daniel B. Neill,et al.  Optimality under noise: higher memory strategies for the alternating prisoner's dilemma. , 2001, Journal of theoretical biology.

[20]  A. Leshem,et al.  Game theory and the frequency selective interference channel , 2009, IEEE Signal Processing Magazine.

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  P. Kay,et al.  Universals and cultural variation in turn-taking in conversation , 2009, Proceedings of the National Academy of Sciences.

[23]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[24]  Russell Y. Webb,et al.  A simple metric for turn-taking in emergent communication , 2012, Adapt. Behav..

[25]  B. Stengel Algorithmic Game Theory: Equilibrium Computation for Two-Player Games in Strategic and Extensive Form , 2007 .

[26]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[27]  Matthew Quinn,et al.  Evolving Communication without Dedicated Communication Channels , 2001, ECAL.

[28]  Andrea Lockerd Thomaz,et al.  Turn Taking for Human-Robot Interaction , 2010, AAAI Fall Symposium: Dialog with Robots.

[29]  Michael A. Goodrich,et al.  Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning , 2011, Machine Learning.

[30]  U. Kaymak,et al.  A Theoretical Analysis of Cooperative Behavior in Multi-agent Q-learning , 2006, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[31]  Sadaoki Furui,et al.  International Speech Communication Association , 2006 .

[32]  Chrystopher L. Nehaniv,et al.  Emergent dynamics of turn-taking interaction in drumming games with a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[33]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[34]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[35]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[36]  Gil Weinberg,et al.  A leader-follower turn-taking model incorporating beat detection in musical human-robot interaction , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[37]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[38]  Jie Yang,et al.  Natural cooperation in wireless networks , 2009, IEEE Signal Processing Magazine.

[39]  W. Trivelpiece,et al.  Ecological segregation of Adelie, gentoo, and chinstrap penguins at King George Island, Antarctica , 1987 .

[40]  Susan Perry,et al.  Time-matched grooming in female primates? New analyses from two species , 2004, Animal Behaviour.

[41]  Maxine Eskénazi,et al.  Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System , 2008, SIGDIAL Workshop.

[42]  Hiroyuki Iizuka,et al.  Adaptability and Diversity in Simulated Turn-taking Behavior , 2003, Artificial Life.

[43]  Geoffrey J. Gordon Agendas for multi-agent learning , 2007, Artif. Intell..

[44]  M. M. Flood Some Experimental Games , 1958 .

[45]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[46]  Dirk Helbing,et al.  How Individuals Learn to Take Turns: Emergence of Alternating Cooperation in a Congestion Game and the Prisoner's Dilemma , 2005, Adv. Complex Syst..

[47]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[48]  Giuseppe Di Battista,et al.  26 Computer Networks , 2004 .