Learning Team Strategies: Soccer Case Studies

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

[1]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[2]  Leonid A. Levin,et al.  Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[3]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[4]  Ray J. Solomonoff,et al.  The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[5]  C. Watkins Learning from delayed rewards , 1989 .

[6]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[7]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[8]  Michael K. Sahota,et al.  Real-time intelligent behaviour in dynamic environments : soccer-playing robots , 1993 .

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[11]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[12]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[13]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[14]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[15]  Manuela M. Veloso,et al.  Beating a Defender in Robotic Soccer: Memory-Based Learning of a Continuous Function , 1995, NIPS.

[16]  Gerhard Weiß,et al.  Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.

[17]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[18]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[19]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[20]  Sandip Sen,et al.  Correlating Internal Parameters and External Performance: Learning Soccer Agents , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[21]  Luca Maria Gambardella,et al.  Learning Real Team Solutions , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[22]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[23]  Jürgen Schmidhuber,et al.  Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[24]  Juergen Schmidhuber,et al.  A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[25]  Corso Elvezia Probabilistic Incremental Program Evolution , 1997 .

[26]  Jürgen Schmidhuber,et al.  Evolving Soccer Strategies , 1997, ICONIP.

[27]  James A. Hendler,et al.  Co-evolving Soccer Softbot Team Coordination with Genetic Programming , 1997, RoboCup.

[28]  Jürgen Schmidhuber,et al.  Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[29]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[30]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[31]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Graduate Texts in Computer Science.

[32]  Jürgen Schmidhuber,et al.  On Learning Soccer Strategies , 1997, ICANN.

[33]  Manuela M. Veloso,et al.  Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[34]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[35]  Hitoshi Matsubara,et al.  Learning of Cooperative actions in multi-agent systems: a case study of pass play in Soccer , 2002 .

[36]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[37]  Jürgen Schmidhuber,et al.  Fast Online Q(λ) , 1998, Machine Learning.

[38]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[39]  Minoru Asada,et al.  A Vision-Based Reinforcement Learning For Coordination Of Soccer Playing Behaviors , 2004 .

[40]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.