Reinforcement Learning Soccer Teams with Incomplete World Models

We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(λ)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[4]  Marco Wiering,et al.  Explorations in efficient reinforcement learning , 1999 .

[5]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[6]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .

[7]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[8]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[9]  Richard S. Sutton,et al.  Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[10]  John R. Koza,et al.  Genetic evolution and co-evolution of computer programs , 1991 .

[11]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[12]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[13]  Jürgen Schmidhuber,et al.  Evolving Soccer Strategies , 1997, ICONIP.

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[16]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[17]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[18]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[20]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[21]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[22]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[23]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[24]  Corso Elvezia Probabilistic Incremental Program Evolution , 1997 .

[25]  Jürgen Schmidhuber,et al.  On Learning Soccer Strategies , 1997, ICANN.

[26]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[27]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[28]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[29]  Stewart W. Wilson,et al.  From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .

[30]  James S. Albus,et al.  New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[31]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.