论文信息 - Our Meeting with Gradual, A Good Strategy for the Iterated Prisoner's Dilemma

Our Meeting with Gradual, A Good Strategy for the Iterated Prisoner's Dilemma

In this paper, after a short return to the description of the classical version of the Iterated Prisoner’s Dilemma and its application to the study of cooperation, we present a new strategy we have found named gradual, which outperforms the tit-for-tat strategy, on which are based a lot of works in the Theory of Cooperation. Since no pure strategy is evolutionarily stable in the IPD, we cannot give a mathematical proof of the absolute superiority of gradual, but we present a convergent set of facts that must be viewed as strong experimental evidences of the superiority of gradual over tit-for-tat in almost every rich environment. We study in detail why this strategy is a good one and we try to identify the difference it has with tit-for-tat. Then we show how we have improved the strength of gradual by using a genetic algorithm, with a genotype we have created, which includes a lot of well-known strategies for the IPD, such as tit-for-tat. We present our ideas on a tree representation of the strategies space and finally we propose a new view of evolution of cooperation in which complexity plays a major role. 1 The Classical Iterated Prisoner’s Dilemma In this paper we first present the classical version of the Iterated Prisoner’s Dilemma which is fully described in [29] and its application to the study of cooperation. We insist on the importance of the pure version of this problem. Then we present a new strategy we have found called gradual, which outperforms the tit-for-tat strategy which is classically recognized as the best possible strategy in the classical IPD. In section 2, we present in details the gradual strategy which has the same qualities than tit-for-tat plus a progressive retaliation and forgiveness possibilities. Two experiments are reported, the first one with twelve general strategies to show the precise behavior of it, the second one with the results of a tournament we have organized in the French edition of the “Scientific American”. In section 3 we try to improve the strength of this strategy by using a genetic algorithm, on a genotype we have created and which includes lots of well-known strategies (in fact our genotype can cover more than 8 × 10 strategies). We present our ideas on a tree representation of the strategies space and finally we propose a new view of evolution of cooperation in which complexity plays a major role. In the last sections we describe our results and we discuss about the natural behavior of this strategy and its good robustness in ecological competitions. 1.1 IPD and Artificial Life This game is issued from the Game Theory of John von Neumann and Oskar Morgenstern and has been introduced by RAND game theorists Merrill Flood and Melvin Dresher in 1952. The idea was to introduce some irrationality in Game Theory, which is used as a way of modelling interactions between individuals. This game has been found to be a very good way of studying cooperation and evolution of cooperation and thus a sort of theory of cooperation based upon reciprocity has been set in a wide literature, such as in [2, 4, 5]. The experimental studies of the IPD and its strategies need a lot of time computation and thus with the progress of computers, a lot of computer-scientists and mathematicians have studied it as they have been able to use specific methods, like genetic algorithms, on it, see [1, 3, 6, 9, 19, 20, 25, 26, 30, 31]. As cooperation is a topic of continuing interest for the social, zoological and biological sciences, a lot of works in those different fields have been made on the IPD: [7, 8, 16, 17, 18, 21, 22, 24, 28, 23]. Although all people who have studied the IPD come from different research fields, it could be said that they are all working on the same topic, which belong to the Artificial Life field, the bottom-up study of Life. 1.2 Recall on the IPD Let two artificial agents have the choice between cooperation and defection. They play one against the other, in a synchronous manner, so that they do not know what the other will play. They get a score according to the situation of the move: • They both cooperate and then get both the cooperation reward, let evaluate it to R points; • They both defect and then get both the selfish punishment, let evaluate it to P points; • One chooses to defect while the other chooses to cooperate, then the one who has defected gets the selfish temptation salary, let it be T points, and the one who has cooperated gets the sucker score, let it be S points. To have a dilemma, temptation must be better than cooperation, which must be better than punishment, which must be better than to be the sucker. This can be formalised as: T > R > P > S Since this one-shot version of the Prisoner’s Dilemma is not very interesting (the most rational choice is to defect), the game is iterated, the final score being the sum of all the moves scored. Each player does not know how many moves there will be, thus each agent’s strategy can be studied, to look, for instance, how each player tries to put cooperation in the game. To avoid the one-shot Prisoner’s Dilemma solution to influence strategies, by giving too much importance to temptation regarding cooperation, it is useful to add the following restriction : 2R > T + S With this restriction, strategies have no advantage in alternatively cooperate and defect. The classical payoff matrix used is shown in table 1.