A Generalist Agent

Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.

[1]  Jost Tobias Springenberg,et al.  How to Spend Your Robot Time: Bridging Kickstarting and Offline Reinforcement Learning for Vision-based Robotic Manipulation , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Oriol Vinyals,et al.  Flamingo: a Visual Language Model for Few-Shot Learning , 2022, ArXiv.

[3]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[4]  S. Levine,et al.  Do As I Can, Not As I Say: Grounding Language in Robotic Affordances , 2022, CoRL.

[5]  Lisa Anne Hendricks,et al.  Training Compute-Optimal Large Language Models , 2022, ArXiv.

[6]  Jacob Menick,et al.  Teaching language models to support answers with verified quotes , 2022, ArXiv.

[7]  A. Gupta,et al.  The Unsurprising Effectiveness of Pre-Trained Vision Models for Control , 2022, ICML.

[8]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[9]  Amy Zhang,et al.  Online Decision Transformer , 2022, ICML.

[10]  Cherepanov,et al.  Competition-level code generation with AlphaCode , 2022, Science.

[11]  A. Torralba,et al.  Pre-Trained Language Models for Interactive Decision-Making , 2022, NeurIPS.

[12]  S. Gu,et al.  Can Wikipedia Help Offline Reinforcement Learning? , 2022, ArXiv.

[13]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[14]  P. Abbeel,et al.  Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , 2022, ICML.

[15]  Diego de Las Casas,et al.  Improving language models by retrieving from trillions of tokens , 2021, ICML.

[16]  S. Gu,et al.  Generalized Decision Transformer for Offline Hindsight Information Matching , 2021, ICLR.

[17]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[18]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[19]  Olivier J. H'enaff,et al.  Perceiver IO: A General Architecture for Structured Inputs & Outputs , 2021, ICLR.

[20]  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[21]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[22]  Po-Sen Huang,et al.  Ethical and social risks of harm from Language Models , 2021, ArXiv.

[23]  Nando de Freitas,et al.  Shaking the foundations: delusions in sequence models for interaction and control , 2021, ArXiv.

[24]  Raia Hadsell,et al.  Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes , 2021, CoRL.

[25]  David J. Fleet,et al.  Pix2seq: A Language Modeling Framework for Object Detection , 2021, ICLR.

[26]  Adams Wei Yu,et al.  SimVLM: Simple Visual Language Model Pretraining with Weak Supervision , 2021, ICLR.

[27]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[28]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[29]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[30]  Oriol Vinyals,et al.  Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.

[31]  Sergey Levine,et al.  Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[32]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[33]  Ivo Danihelka,et al.  Muesli: Combining Improvements in Policy Optimization , 2021, ICML.

[34]  Chelsea Finn,et al.  Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos , 2021, Robotics: Science and Systems.

[35]  Tom Everitt,et al.  Alignment of Language Agents , 2021, ArXiv.

[36]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[37]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[38]  Tim Rocktäschel,et al.  My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control , 2020, ICLR.

[39]  Misha Denil,et al.  Offline Learning from Demonstrations and Unlabeled Experience , 2020, ArXiv.

[40]  Wenlong Huang,et al.  One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.

[41]  Gabriel Synnaeve,et al.  Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters , 2020, INTERSPEECH.

[42]  Nando de Freitas,et al.  Critic Regularized Regression , 2020, NeurIPS.

[43]  Yuval Tassa,et al.  dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[44]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[45]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[46]  Ville Hautamäki,et al.  Benchmarking End-to-End Behavioural Cloning on Video Games , 2020, 2020 IEEE Conference on Games (CoG).

[47]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[48]  J. Schulman,et al.  Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[49]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[50]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Oleg O. Sushkov,et al.  Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[52]  H. Francis Song,et al.  V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.

[53]  Misha Denil,et al.  Task-Relevant Adversarial Imitation Learning , 2019, CoRL.

[54]  Sergio Gomez Colmenarejo,et al.  RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning , 2020 .

[55]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[56]  Stuart Russell Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[57]  Lav R. Varshney,et al.  CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[58]  Ali Farhadi,et al.  OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[60]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[61]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[62]  Thien Huu Nguyen,et al.  BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning , 2018, ICLR.

[63]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[64]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[65]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[66]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[67]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[68]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[69]  Radu Soricut,et al.  Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[70]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[71]  Jürgen Schmidhuber,et al.  One Big Net For Everything , 2018, ArXiv.

[72]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[73]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[74]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[75]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[76]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[77]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[78]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[79]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[80]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[81]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[82]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[83]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[84]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[86]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[87]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[88]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[89]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[90]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[91]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[92]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[93]  J. Hawkins,et al.  On Intelligence , 2004 .

[94]  P. Bach-y-Rita,et al.  Sensory substitution and the human–machine interface , 2003, Trends in Cognitive Sciences.

[95]  N. Whitman A bitter lesson. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[96]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[97]  V. Mountcastle,et al.  An organizing principle for cerebral function : the unit module and the distributed system , 1978 .