Flexible theft and resolute punishment: Evolutionary dynamics of social behavior among reinforcement-learning agents

Flexible theft and resolute punishment: Evolutionary dynamics of social behavior among reinforcement-learning agents James MacGlashan Brown University, Providence, Rhode Island, USA Michael Littman Brown University, Providence, Rhode Island, USA Fiery Cushman Brown University, Providence, Rhode Island, USA Abstract: Existing models of the evolution of social behavior typically involve innate strategies such as tit-for-tat. Yet, both behavioral and neural evidence indicates a substantial role for learned social behavior. We explore the evolutionary dynamics of two simple social behaviors among learning agents: Theft and punishment. In our simulation, agents employ Q-learning, a common reinforcement learning algorithm. Agents reproduce in proportion to the objective rewards they accrue, but the subjective reward function that guides learning and action evolves by natural selection. We find that agents typically evolve a bias to punish thieves that is sufficiently strong that it cannot be unlearned. Agents also typically evolve a bias to abstain from theft, but this is weak enough to permit rapid learning. This flexibility allows would-be thieves to exploit non-punishers. Finally, we show qualitatively similar results in a behavioral experiment on human participants: Flexible theft, but resolute punishment.