论文信息 - From Motor Control to Team Play in Simulated Humanoid Football

From Motor Control to Team Play in Simulated Humanoid Football

Siqi Liu*,1, Guy Lever*,1, Zhe Wang*,1, Josh Merel1, S. M. Ali Eslami1, Daniel Hennes1, Wojciech M. Czarnecki1, Yuval Tassa1, Shayegan Omidshafiei1, Abbas Abdolmaleki1, Noah Y. Siegel1, Leonard Hasenclever1, Luke Marris1, Saran Tunyasuvunakool1, H. Francis Song1, Markus Wulfmeier1, Paul Muller1, Tuomas Haarnoja1, Brendan D. Tracey1, Karl Tuyls1, Thore Graepel1 and Nicolas Heess*,1 *Equal contributions, 1DeepMind

[1] K. Lashley. The problem of serial order in behavior , 1951 .

[2] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[4] Roger C. Schank,et al. Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[5] A. Elo. The rating of chessplayers, past and present , 1978 .

[6] J. Krebs,et al. Arms races between and within species , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[8] Marc H. Raibert,et al. Legged Robots That Balance , 1986, IEEE Expert.

[9] David A. Rosenbaum,et al. Hierarchical organization of motor programs. , 1987 .

[10] Micha Sharir,et al. Algorithmic motion planning in robotics , 1991, Computer.

[11] Richard Reviewer-Granger. Unified Theories of Cognition , 1991, Journal of Cognitive Neuroscience.

[12] J. Urgen Schmidhuber. Neural Sequence Chunkers , 1991 .

[13] Subbarao Kambhampati,et al. Combining Specialized Reasoners and General Purpose Planners: A Case Study , 1991, AAAI.

[14] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[15] Robert J. Crutcher,et al. The role of deliberate practice in the acquisition of expert performance. , 1993 .

[16] James S. Albus,et al. A reference model architecture for intelligent systems design , 1993 .

[17] Karl Sims,et al. Evolving virtual creatures , 1994, SIGGRAPH.

[18] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[20] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[21] Hiroaki Kitano,et al. RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[22] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..

[23] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[24] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[26] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[27] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[28] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[29] Peter Stone,et al. Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[30] Martin A. Riedmiller,et al. Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[31] J. Fuster. The Prefrontal Cortex—An Update Time Is of the Essence , 2001, Neuron.

[32] K. Tuyls,et al. Reinforcement Learning in Large State Spaces , 2002, RoboCup.

[33] René Boel,et al. Discrete event dynamic systems: Theory and applications. , 2002 .

[34] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[35] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[36] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[37] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[38] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[39] C. Koch,et al. Invariant visual representation by single neurons in the human brain , 2005, Nature.

[40] Peter Stone,et al. The Chin Pinch: A Case Study in Skill Learning on a Legged Robot , 2006, RoboCup.

[41] S. Bennett,et al. Observational Modeling Effects for Movement Dynamics and Movement Outcome Measures Across Differing Task Constraints: A Meta-Analysis , 2006, Journal of motor behavior.

[42] H. Bekkering,et al. Joint action: bodies and minds moving together , 2006, Trends in Cognitive Sciences.

[43] Peter Stone,et al. Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[44] Peter Stone,et al. Autonomous Learning of Stable Quadruped Locomotion , 2006, RoboCup.

[45] KangKang Yin,et al. SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[46] Peter Stone,et al. Model-Based Reinforcement Learning in a Complex Domain , 2008, RoboCup.

[47] C. Koch,et al. Sparse but not ‘Grandmother-cell’ coding in the medial temporal lobe , 2008, Trends in Cognitive Sciences.

[48] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49] Martin Lauer,et al. Learning to dribble on a real robot by success and failure , 2008, 2008 IEEE International Conference on Robotics and Automation.

[50] Peter Stone,et al. Learning Complementary Multiagent Behaviors: A Case Study , 2009, RoboCup.

[51] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[52] M. van de Panne,et al. Generalized biped walking control , 2010, ACM Trans. Graph..

[53] Martin A. Riedmiller,et al. On Progress in RoboCup: The Simulation League Showcase , 2010, RoboCup.

[54] Martin Lauer,et al. Cognitive concepts in autonomous soccer playing robots , 2010, Cognitive Systems Research.

[55] Peter Stone,et al. Learning Powerful Kicks on the Aibo ERS-7: The Quest for a Striker , 2010, RoboCup.

[56] N. Le Fort-Piat,et al. The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[57] Daniel Urieli,et al. On optimizing interdependent skills: a case study in simulated 3D humanoid robot soccer , 2011, AAMAS.

[58] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[59] Eli M. Swanson,et al. Evolution of Cooperation among Mammalian Carnivores and Its Relevance to Hominin Evolution , 2012, Current Anthropology.

[60] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[61] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[62] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[63] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64] A. Williams,et al. Developmental activities and the acquisition of superior anticipation and decision making in soccer players , 2012, Journal of sports sciences.

[65] Jan Peters,et al. Probabilistic Movement Primitives , 2013, NIPS.

[66] Javier R. Movellan,et al. STAC: Simultaneous tracking and calibration , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[67] Patrick MacAlpine,et al. Humanoid robots learning to walk faster: from the real world to simulation and back , 2013, AAMAS.

[68] J. Baker,et al. 20 years later: deliberate practice and the development of expertise in sport , 2014 .

[69] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[70] Paul Sajda,et al. Knowing when not to swing: EEG evidence that enhanced perception–action coupling underlies baseball batter expertise , 2015, NeuroImage.

[71] Eder Gonçalves,et al. Anticipation in Soccer: A Systematic Review , 2015 .

[72] J. Diedrichsen,et al. Motor skill learning between selection and execution , 2015, Trends in Cognitive Sciences.

[73] Zoran Popovic,et al. Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[74] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[75] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[76] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[77] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[78] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.