Repeated Triangular Trade: Sustaining Circular Cooperation with Observation Errors

We introduce a new fundamental problem called triangular trade, which is a natural extension of the well-studied prisoner’s dilemma for three (or more) players where a player cannot directly punish a seemingly defecting player. More specifically, this problem deals with a situation where the power/influence of players is one-way, players would be better off if they maintain circular cooperation, but each player has an incentive to defect. We analyze whether players can sustain such circular cooperation when they repeatedly play this game and each player observes the actions of another player with some observation errors (imperfect private monitoring). We confirm that no simple strategy can constitute an equilibrium within any reasonable parameter settings when there are only two actions: “Cooperate” and “Defect.” Thus, we introduce two additional actions: “Whistle” and “Punish,” which can be considered as a slight modification of “Cooperate.” Then, players can achieve sustainable cooperation using a simple strategy called Remote Punishment strategy (RP), which constitutes an equilibrium for a wide range of parameters. Furthermore, we show the payoff obtained by a variant of RP is optimal within a very general class of strategies that covers virtually all meaningful strategies.

[1]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[2]  Prashant Doshi,et al.  On the Difficulty of Achieving Equilibrium in Interactive POMDPs , 2006, AI&M.

[3]  D. Fudenberg,et al.  Digitized by the Internet Archive in 2011 with Funding from Working Paper Department of Economics the Folk Theorem with Imperfect Public Information , 2022 .

[4]  Vincent Conitzer,et al.  Fast Equilibrium Computation for Infinitely Repeated Games , 2013, AAAI.

[5]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[6]  Tadashi Sekiguchi,et al.  Achieving Sustainable Cooperation in Generalized Prisoner's Dilemma with Observation Errors , 2016, AAAI.

[7]  G. Maggi The Role of Multilateral Institutions in International Trade Cooperation , 1999 .

[8]  Moshe Tennenholtz,et al.  Learning equilibria in repeated congestion games , 2009, AAMAS.

[9]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[10]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[11]  Ruosong Wang,et al.  K-Memory Strategies in Repeated Games , 2017, AAMAS.

[12]  Jeffrey C. Ely,et al.  Belief-free Equilibria in Repeated Games , 2005 .

[13]  Brahim Chaib-draa,et al.  Repeated games for multiagent systems: a survey , 2013, The Knowledge Engineering Review.

[14]  Jeffrey C. Ely,et al.  A Robust Folk Theorem for the Prisoner's Dilemma , 2002, J. Econ. Theory.

[15]  Michele Piccione,et al.  The Repeated Prisoner's Dilemma with Imperfect Private Monitoring , 2002, J. Econ. Theory.

[16]  Drew Fudenberg,et al.  The Folk Theorem in Repeated Games with Discounting or with Incomplete Information , 1986 .

[17]  G. Mailath,et al.  Repeated Games and Reputations , 2006 .

[18]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[19]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[20]  The myth of the Folk Theorem , 2010 .

[21]  M. Nowak,et al.  Evolution of indirect reciprocity by image scoring , 1998, Nature.