Evolving subjective utilities: Prisoner's Dilemma game examples

We have proposed the utility-based Q-learning concept that supposes an agent internally has an emotional mechanism that derives subjective utilities from objective rewards and the agent uses the utilities as rewards of Q-learning. We have also proposed such an emotional mechanism that facilitates cooperative actions in Prisoner's Dilemma (PD) games. However, this mechanism has been designed and implemented manually in order to force the agents to take cooperative actions in PD games. Since it seems slightly unnatural, this work considers whether such an emotional mechanism exists and where it comes from. We try to evolve such mechanisms that facilitate cooperative actions in PD games by conducting simulation experiments with a genetic algorithm, and we investigate the evolved mechanisms from various points of view.

[1]  F Vega-Redondo,et al.  Long-run cooperation in the one-shot Prisoner's Dilemma: a hierarchic evolutionary approach. , 1996, Bio Systems.

[2]  G. Pagnoni,et al.  A Neural Basis for Social Cooperation , 2002, Neuron.

[3]  Graham Kendall,et al.  Learning versus evolution in iterated prisoner's dilemma , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[7]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[8]  Robert Axelrod,et al.  The Evolution of Strategies in the Iterated Prisoner's Dilemma , 2001 .

[9]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[10]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Michael A. Goodrich,et al.  Satisficing and Learning Cooperation in the Prisoner s Dilemma , 2001, IJCAI.

[13]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[14]  Koichi Moriyama,et al.  Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games , 2009, Web Intell. Agent Syst..

[15]  David B. Fogel,et al.  Evolving Behaviors in the Iterated Prisoner's Dilemma , 1993, Evolutionary Computation.

[16]  Hussein A. Abbass,et al.  Evolution and Incremental Learning in the Iterated Prisoner's Dilemma , 2009, IEEE Transactions on Evolutionary Computation.

[17]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  Klaus M. Schmidt,et al.  A Theory of Fairness, Competition, and Cooperation , 1999 .