From Motor Control to Team Play in Simulated Humanoid Football

Siqi Liu*,1, Guy Lever*,1, Zhe Wang*,1, Josh Merel1, S. M. Ali Eslami1, Daniel Hennes1, Wojciech M. Czarnecki1, Yuval Tassa1, Shayegan Omidshafiei1, Abbas Abdolmaleki1, Noah Y. Siegel1, Leonard Hasenclever1, Luke Marris1, Saran Tunyasuvunakool1, H. Francis Song1, Markus Wulfmeier1, Paul Muller1, Tuomas Haarnoja1, Brendan D. Tracey1, Karl Tuyls1, Thore Graepel1 and Nicolas Heess*,1 *Equal contributions, 1DeepMind

[1]  K. Lashley The problem of serial order in behavior , 1951 .

[2]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[4]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[5]  A. Elo The rating of chessplayers, past and present , 1978 .

[6]  J. Krebs,et al.  Arms races between and within species , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[8]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[9]  David A. Rosenbaum,et al.  Hierarchical organization of motor programs. , 1987 .

[10]  Micha Sharir,et al.  Algorithmic motion planning in robotics , 1991, Computer.

[11]  Richard Reviewer-Granger Unified Theories of Cognition , 1991, Journal of Cognitive Neuroscience.

[12]  J. Urgen Schmidhuber Neural Sequence Chunkers , 1991 .

[13]  Subbarao Kambhampati,et al.  Combining Specialized Reasoners and General Purpose Planners: A Case Study , 1991, AAAI.

[14]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[15]  Robert J. Crutcher,et al.  The role of deliberate practice in the acquisition of expert performance. , 1993 .

[16]  James S. Albus,et al.  A reference model architecture for intelligent systems design , 1993 .

[17]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[18]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[21]  Hiroaki Kitano,et al.  RoboCup: The Robot World Cup Initiative , 1997, AGENTS '97.

[22]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[23]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[24]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[26]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[27]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[28]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[29]  Peter Stone,et al.  Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[30]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[31]  J. Fuster The Prefrontal Cortex—An Update Time Is of the Essence , 2001, Neuron.

[32]  K. Tuyls,et al.  Reinforcement Learning in Large State Spaces , 2002, RoboCup.

[33]  René Boel,et al.  Discrete event dynamic systems: Theory and applications. , 2002 .

[34]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[35]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[36]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[37]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[38]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[39]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[40]  Peter Stone,et al.  The Chin Pinch: A Case Study in Skill Learning on a Legged Robot , 2006, RoboCup.

[41]  S. Bennett,et al.  Observational Modeling Effects for Movement Dynamics and Movement Outcome Measures Across Differing Task Constraints: A Meta-Analysis , 2006, Journal of motor behavior.

[42]  H. Bekkering,et al.  Joint action: bodies and minds moving together , 2006, Trends in Cognitive Sciences.

[43]  Peter Stone,et al.  Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study , 2006, RoboCup.

[44]  Peter Stone,et al.  Autonomous Learning of Stable Quadruped Locomotion , 2006, RoboCup.

[45]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[46]  Peter Stone,et al.  Model-Based Reinforcement Learning in a Complex Domain , 2008, RoboCup.

[47]  C. Koch,et al.  Sparse but not ‘Grandmother-cell’ coding in the medial temporal lobe , 2008, Trends in Cognitive Sciences.

[48]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49]  Martin Lauer,et al.  Learning to dribble on a real robot by success and failure , 2008, 2008 IEEE International Conference on Robotics and Automation.

[50]  Peter Stone,et al.  Learning Complementary Multiagent Behaviors: A Case Study , 2009, RoboCup.

[51]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[52]  M. van de Panne,et al.  Generalized biped walking control , 2010, ACM Trans. Graph..

[53]  Martin A. Riedmiller,et al.  On Progress in RoboCup: The Simulation League Showcase , 2010, RoboCup.

[54]  Martin Lauer,et al.  Cognitive concepts in autonomous soccer playing robots , 2010, Cognitive Systems Research.

[55]  Peter Stone,et al.  Learning Powerful Kicks on the Aibo ERS-7: The Quest for a Striker , 2010, RoboCup.

[56]  N. Le Fort-Piat,et al.  The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[57]  Daniel Urieli,et al.  On optimizing interdependent skills: a case study in simulated 3D humanoid robot soccer , 2011, AAMAS.

[58]  Gerhard Weiss,et al.  Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[59]  Eli M. Swanson,et al.  Evolution of Cooperation among Mammalian Carnivores and Its Relevance to Hominin Evolution , 2012, Current Anthropology.

[60]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[61]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[62]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[63]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  A. Williams,et al.  Developmental activities and the acquisition of superior anticipation and decision making in soccer players , 2012, Journal of sports sciences.

[65]  Jan Peters,et al.  Probabilistic Movement Primitives , 2013, NIPS.

[66]  Javier R. Movellan,et al.  STAC: Simultaneous tracking and calibration , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[67]  Patrick MacAlpine,et al.  Humanoid robots learning to walk faster: from the real world to simulation and back , 2013, AAMAS.

[68]  J. Baker,et al.  20 years later: deliberate practice and the development of expertise in sport , 2014 .

[69]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[70]  Paul Sajda,et al.  Knowing when not to swing: EEG evidence that enhanced perception–action coupling underlies baseball batter expertise , 2015, NeuroImage.

[71]  Eder Gonçalves,et al.  Anticipation in Soccer: A Systematic Review , 2015 .

[72]  J. Diedrichsen,et al.  Motor skill learning between selection and execution , 2015, Trends in Cognitive Sciences.

[73]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[74]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[75]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[76]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[77]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[78]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[79]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[80]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[81]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[82]  Marc G. Bellemare,et al.  Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[83]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[84]  Scott Kuindersma,et al.  Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.

[85]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[86]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[87]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[88]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[89]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[90]  Glen Berseth,et al.  DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..

[91]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[92]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[93]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[94]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[95]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[96]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[97]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[98]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[99]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[100]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[101]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[102]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[103]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[104]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[105]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[106]  Patrick MacAlpine,et al.  Overlapping layered learning , 2018, Artif. Intell..

[107]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[108]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[109]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[110]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[111]  Thore Graepel,et al.  Re-evaluating evaluation , 2018, NeurIPS.

[112]  N. Heess,et al.  Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[113]  Joel Z. Leibo,et al.  Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[114]  J. Forbes,et al.  DReCon: data-driven responsive control of physics-based characters , 2019, ACM Trans. Graph..

[115]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[116]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[117]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[118]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[119]  Yee Whye Teh,et al.  Information asymmetry in KL-regularized RL , 2019, ICLR.

[120]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[121]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[122]  Patrick MacAlpine,et al.  UT Austin Villa: RoboCup 2019 3D Simulation League Competition and Technical Challenge Champions , 2019, RoboCup.

[123]  Thomas Röfer,et al.  B-Human 2019 - Complex Team Play Under Natural Lighting Conditions , 2019, RoboCup.

[124]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[125]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[126]  Luís Paulo Reis,et al.  Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning , 2019, RoboCup.

[127]  Sergey Levine,et al.  Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning , 2019, CoRL.

[128]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[129]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[130]  Greg Wayne,et al.  Hierarchical motor control in mammals and machines , 2019, Nature Communications.

[131]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[132]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[133]  Jonathan W. Hurst,et al.  Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie , 2019, ArXiv.

[134]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[135]  Sunmin Lee,et al.  Learning predict-and-simulate policies from unorganized human motion data , 2019, ACM Trans. Graph..

[136]  S. Levine,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[137]  Max Jaderberg,et al.  Real World Games Look Like Spinning Tops , 2020, NeurIPS.

[138]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[139]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[140]  Martin A. Riedmiller,et al.  Compositional Transfer in Hierarchical Reinforcement Learning , 2019, Robotics: Science and Systems.

[141]  Garrett Warnell,et al.  Reinforced Grounded Action Transformation for Sim-to-Real Transfer , 2020, ArXiv.

[142]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[143]  O. Bousquet,et al.  Google Research Football: A Novel Reinforcement Learning Environment , 2019, AAAI.

[144]  Raia Hadsell,et al.  CoMic: Complementary Task Learning & Mimicry for Reusable Skills , 2020, ICML.

[145]  Yaodong Yang,et al.  An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective , 2020, ArXiv.

[146]  Yee Whye Teh,et al.  Behavior Priors for Efficient Reinforcement Learning , 2020, J. Mach. Learn. Res..

[147]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[148]  Michal Valko,et al.  Game Plan: What AI can do for Football, and What Football can do for AI , 2020, J. Artif. Intell. Res..

[149]  Martin A. Riedmiller,et al.  Data-efficient Hindsight Off-policy Option Learning , 2020, ICML.

[150]  Weifeng Chen,et al.  Learning to Sit: Synthesizing Human-Chair Interactions via Hierarchical Control , 2019, AAAI.