论文信息 - Learning Team Strategies: Soccer Case Studies

Learning Team Strategies: Soccer Case Studies

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

[1] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[2] Leonid A. Levin,et al. Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[3] Nichael Lynn Cramer,et al. A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[4] Ray J. Solomonoff,et al. The Application of Algorithmic Probability to Problems in Artificial Intelligence , 1985, UAI.

[5] C. Watkins. Learning from delayed rewards , 1989 .

[6] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[7] Geoffrey E. Hinton,et al. Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[8] Michael K. Sahota,et al. Real-time intelligent behaviour in dynamic environments : soccer-playing robots , 1993 .

[9] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[11] Stephen I. Gallant,et al. Neural network learning and expert systems , 1993 .

[12] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[13] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[14] Shumeet Baluja,et al. A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[15] Manuela M. Veloso,et al. Beating a Defender in Robotic Soccer: Memory-Based Learning of a Continuous Function , 1995, NIPS.

[16] Gerhard Weiß,et al. Adaptation and Learning in Multi-Agent Systems: Some Remarks and a Bibliography , 1995, Adaption and Learning in Multi-Agent Systems.

[17] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[18] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[19] Rich Caruana,et al. Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[20] Sandip Sen,et al. Correlating Internal Parameters and External Performance: Learning Soccer Agents , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[21] Luca Maria Gambardella,et al. Learning Real Team Solutions , 1996, ECAI Workshop LDAIS / ICMAS Workshop LIOME.

[22] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[23] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[24] Juergen Schmidhuber,et al. A General Method For Incremental Self-Improvement And Multi-Agent Learning In Unrestricted Environme , 1999 .

[25] Corso Elvezia. Probabilistic Incremental Program Evolution , 1997 .

[26] Jürgen Schmidhuber,et al. Evolving Soccer Strategies , 1997, ICONIP.

[27] James A. Hendler,et al. Co-evolving Soccer Softbot Team Coordination with Genetic Programming , 1997, RoboCup.

[28] Jürgen Schmidhuber,et al. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997, Neural Networks.

[29] William I. Gasarch,et al. Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[30] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[31] Paul M. B. Vitányi,et al. An Introduction to Kolmogorov Complexity and Its Applications , 1997, Graduate Texts in Computer Science.

[32] Jürgen Schmidhuber,et al. On Learning Soccer Strategies , 1997, ICANN.

[33] Manuela M. Veloso,et al. Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[34] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[35] Hitoshi Matsubara,et al. Learning of Cooperative actions in multi-agent systems: a case study of pass play in Soccer , 2002 .

[36] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[37] Jürgen Schmidhuber,et al. Fast Online Q(λ) , 1998, Machine Learning.

[38] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.

[39] Minoru Asada,et al. A Vision-Based Reinforcement Learning For Coordination Of Soccer Playing Behaviors , 2004 .

[40] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.