Learning to cooperate in a continuous tragedy of the commons

In previous work, we discussed that social dilemmas are often present in multi-agent systems [3]. Social dilemmas are problems in which we can only find a good solution if we consider the benefit of others in addition to our own benefit. Altruistic punishment has been identified as an important mechanism to enforce this consideration. However, as the punishment is altruistic, deciding whether to punish essentially entails a second-order social dilemma. We developed a methodology that allowed individually learning agents to reach satisfactory solutions in a social dilemma with a continuous strategy space, called the Ultimatum Game [2]. We extended this methodology to thousands of agents, using social networks [4]. Moreover, we devoted attention to the tragedy of the commons, a social dilemma typically exemplified by the Public Goods Game (PGG) [1]. In this game, which is played repeatedly, every agent i (out of n) has to decide on an investment μi e [0, C]. The summed investment is multiplied by a factor 1 < r < n, and equally distributed over all agents. Agent i's individual benefit (or reward) is maximized by μi = 0, whereas the group gains the most by collectively playing μi = C. Altruistic punishment (i.e., reducing an other agent's reward by an amount e, with a cost c < e to the punisher) allows agents to force others to invest a higher amount, but performing such punishment is clearly not individually rational. In earlier work, we restricted ourselves to a small number of strategies and/or agents in this game [1].