Colearning in Differential Games

Game playing has been a popular problem area for research in artificial intelligence and machine learning for many years. In almost every study of game playing and machine learning, the focus has been on games with a finite set of states and a finite set of actions. Further, most of this research has focused on a single player or team learning how to play against another player or team that is applying a fixed strategy for playing the game. In this paper, we explore multiagent learning in the context of game playing and develop algorithms for “co-learning” in which all players attempt to learn their optimal strategies simultaneously. Specifically, we address two approaches to colearning, demonstrating strong performance by a memory-based reinforcement learner and comparable but faster performance with a tree-based reinforcement learner.

[1]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Milind Tambe,et al.  Tracking Dynamic Team Activity , 1996, AAAI/IAAI, Vol. 1.

[4]  Xin Yao,et al.  Evolutionary computation : theory and applications , 1999 .

[5]  John J. Grefenstette,et al.  Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[6]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[7]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[8]  John J. Grefenstette,et al.  Lamarckian Learning in Multi-Agent Environments , 1991, ICGA.

[9]  Mark D. Smucker,et al.  Iterated Prisoner's Dilemma with Choice and Refusal of Partners: Evolutionary Results , 1995, ECAL.

[10]  Manuela M. Veloso,et al.  Towards collaborative and adversarial learning: a case study in robotic soccer , 1998, Int. J. Hum. Comput. Stud..

[11]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[12]  John J. Grefenstette,et al.  Methods for Competitive and Cooperative Co-evolution , 1996 .

[13]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[14]  Steven Salzberg,et al.  A Teaching Strategy for Memory-Based Control , 1997, Artificial Intelligence Review.

[15]  Andrew W. Moore,et al.  The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.

[16]  R. Collins Studies in artificial evolution , 1992 .

[17]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[18]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[19]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[20]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[21]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[22]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[23]  Manuela M. Veloso,et al.  Beating a Defender in Robotic Soccer: Memory-Based Learning of a Continuous Function , 1995, NIPS.

[24]  Hendrik Van Brussel,et al.  A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[25]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[28]  Toshiharu Sugawara,et al.  On-Line Learning of Coordination Plans , 1993 .

[29]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[30]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[31]  John J. Grefenstette,et al.  Learning sequential decision rules using simulation models and competition , 2004, Machine Learning.

[32]  N. J. Rao,et al.  Pursuit-Evasion of Two Aircraft in a Horizontal Plane , 1980 .

[33]  Leemon C. Baird,et al.  Residual advantage learning applied to a differential game , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[34]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[35]  David George Heath,et al.  A geometric framework for machine learning , 1993 .

[36]  Rufus Isaacs,et al.  Differential Games , 1965 .

[37]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[38]  Joseph Lewin,et al.  Differential Games , 1994 .

[39]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[40]  C. Lee Giles,et al.  Neural Information Processing Systems 7 , 1995 .

[41]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[42]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[43]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[44]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[45]  Milind Tambe Teamwork in Real-World, Dynamic Environments. , 1996 .

[46]  C. Anderson,et al.  Multigrid Q-Learning , 1994 .

[47]  Andrew W. Moore,et al.  Multiresolution Instance-Based Learning , 1995, IJCAI.

[48]  John W. Sheppard Multi-agent reinforcement learning in Markov games , 1997 .

[49]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[50]  Steven L. Salzberg,et al.  On growing better decision trees from data , 1996 .

[51]  B A Huberman,et al.  Evolutionary games and computer simulations. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Jeffrey S. Rosenschein,et al.  Deals Among Rational Agents , 1985, IJCAI.

[53]  A. Harry Klopf,et al.  Reinforcement Learning Applied to a Differential Game , 1995, Adapt. Behav..

[54]  Steven Salzberg,et al.  Distance Metrics for Instance-Bsed Learning , 1991, ISMIS.

[55]  John J. Grefenstette,et al.  A Coevolutionary Approach to Learning Sequential Decision Rules , 1995, ICGA.

[56]  Gian Luca Foresti,et al.  A distributed probabilistic system for adaptive regulation of image processing parameters , 1996, IEEE Trans. Syst. Man Cybern. Part B.