Evolution and incremental learning in the iterative prisoner's dilemma

This paper investigates the use of evolution and incremental learning to find an optimal strategy in the iterative prisoner's dilemma (IPD) problem, given an environment with a collection of unknown strategies. The Meta-Lamarckian Memetic learning (MLML) scheme is conceptualized based on the biological evolution of man and his abilities to accumulate knowledge and learn from past experiences. Learning was found to be the dominant force for improvement in the short run while improvement in the long run is sustained by the process of evolution. Learning is also much more effective when carried out on an incremental basis as the games progress. A series of simulation results obtained verified that the best performance is attained when a hybrid combination of learning and evolution is carried out on an incremental basis, not just evolution or learning alone.

[1]  Daniel B. Neill,et al.  Optimality under noise: higher memory strategies for the alternating prisoner's dilemma. , 2001, Journal of theoretical biology.

[2]  Mark D. Smucker,et al.  Analyzing Social Network Structures in the Iterated Prisoner's Dilemma with Choice and Refusal , 1995, adap-org/9501002.

[3]  Xin Yao,et al.  The impact of noise on iterated prisoner's dilemma with multiple levels of cooperation , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[4]  Claude Lattaud,et al.  The Artificial Evolution of Cooperation , 1995, Artificial Evolution.

[5]  David B. Fogel,et al.  Evolving Behaviors in the Iterated Prisoner's Dilemma , 1993, Evolutionary Computation.

[6]  Robert Axelrod,et al.  The Evolution of Strategies in the Iterated Prisoner's Dilemma , 2001 .

[7]  Brian Skyrms,et al.  Chaos and the Explanatory Significance of Equilibrium: Strange Attractors in Evolutionary Game Dynamics , 1992, PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association.

[8]  Peter I. Cowling,et al.  A Memetic Approach to the Nurse Rostering Problem , 2001, Applied Intelligence.

[9]  Steven Orla Kimbrough,et al.  Computers play the beer game: can artificial agents manage supply chains? , 2002, Decis. Support Syst..

[10]  D. Fogel,et al.  On the instability of evolutionary stable strategies. , 1997, Bio Systems.

[11]  Xin Yao,et al.  Why more choices cause less cooperation in iterated prisoner's dilemma , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[12]  David E. Goldberg,et al.  The Design of Innovation: Lessons from and for Competent Genetic Algorithms , 2002 .

[13]  Graham Kendall,et al.  Learning versus evolution in iterated prisoner's dilemma , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[14]  Ryszard S. Michalski,et al.  Incremental learning with partial instance memory , 2002, Artif. Intell..

[15]  Xin Yao,et al.  Speciation as automatic categorical modularization , 1997, IEEE Trans. Evol. Comput..

[16]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[17]  Geoffrey E. Hinton,et al.  How Learning Can Guide Evolution , 1996, Complex Syst..

[18]  Alan S. Perelson,et al.  The Baldwin effect in the immune system: learning by somatic hypermutation , 1996 .

[19]  David F. Rogers,et al.  Optimal bivariate clustering and a genetic algorithm with an application in cellular manufacturing , 2005, Eur. J. Oper. Res..

[20]  Xin Yao,et al.  Does extra genetic diversity maintain escalation in a co-evolutionary arms race , 2000 .

[21]  Ryszard S. Michalski,et al.  LEARNABLE EVOLUTION MODEL: Evolutionary Processes Guided by Machine Learning , 2004, Machine Learning.

[22]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[23]  T. Deacon The Symbolic Species: The Co-evolution of Language and the Brain , 1998 .

[24]  Pablo Moscato,et al.  A memetic algorithm for the total tardiness single machine scheduling problem , 2001, Eur. J. Oper. Res..

[25]  Hisao Ishibuchi,et al.  Evolution of iterated prisoner's dilemma game strategies in structured demes under random pairing in game playing , 2005, IEEE Transactions on Evolutionary Computation.

[26]  W. Hamilton,et al.  The Evolution of Cooperation , 1984 .

[27]  D. Kraines,et al.  Pavlov and the prisoner's dilemma , 1989 .

[28]  J. Golbeck Evolving Strategies for the Prisoner’s Dilemma , 2004 .

[29]  Xin Yao,et al.  Behavioral diversity, choices and noise in the iterated prisoner's dilemma , 2005, IEEE Transactions on Evolutionary Computation.

[30]  David B. Fogel,et al.  On the Relationship between the Duration of an Encounter and the Evolution of Cooperation in the Iterated Prisoner's Dilemma , 1995, Evolutionary Computation.

[31]  David Hales,et al.  Change Your Tags Fast! - A Necessary Condition for Cooperation? , 2004, MABS.

[32]  David B. Fogel,et al.  Evolutionary Stable Strategies Are Not Always Stable under Evolutionary Dynamics , 1995, Evolutionary Programming.

[33]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Christophe G. Giraud-Carrier,et al.  Unifying Learning with Evolution Through Baldwinian Evolution and Lamarckism , 2000, Advances in Computational Intelligence and Learning.

[35]  Takaya Arita,et al.  Interactions between learning and evolution: the outstanding strategy generated by the Baldwin effect. , 2004, Bio Systems.

[36]  Peter Bodo,et al.  In-class Simulations of the Iterated Prisoner's Dilemma Game , 2002 .

[37]  Jean-Paul Delahaye,et al.  Complete Classes of Strategies for the Classical Iterated Prisoner's Dilemma , 1998, Evolutionary Programming.

[38]  J. Baldwin A New Factor in Evolution , 1896, The American Naturalist.

[39]  Andy J. Keane,et al.  Meta-Lamarckian learning in memetic algorithms , 2004, IEEE Transactions on Evolutionary Computation.

[40]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[41]  HospBenjamin The genetic algorithm and the Prisoner's Dilemma , 2004 .

[42]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[43]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[44]  Xin Yao,et al.  On Evolving Robust Strategies for Iterated Prisoner's Dilemma , 1993, Evo Workshops.

[45]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[46]  Bert Bredeweg,et al.  Constructing Progressive Learning Routes through Qualitative Simulation Models in Ecology , 2000 .

[47]  Hisao Ishibuchi,et al.  Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling , 2003, IEEE Trans. Evol. Comput..

[48]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[49]  Nanlin Jin,et al.  Population Based Incremental Learning Versus Genetic Algorithms: Iterated Prisoners Dilemma , 2004 .