Imitating Interactive Intelligence

A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of artificial intelligence (AI) research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.

[1]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[2]  Terry Winograd,et al.  Understanding natural language , 1974 .

[3]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[4]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[5]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[6]  Robin I. M. Dunbar Coevolution of neocortical size, group size and language in humans , 1993, Behavioral and Brain Sciences.

[7]  Aaron D. Wyner,et al.  Prediction and Entropy of Printed English , 1993 .

[8]  C. Heyes,et al.  Social learning in animals : the roots of culture , 1996 .

[9]  R. Byrne,et al.  Priming primates: Human and otherwise , 1998, Behavioral and Brain Sciences.

[10]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[11]  Sham M. Kakade,et al.  On the sample complexity of reinforcement learning. , 2003 .

[12]  K. Laland Social learning strategies , 2004, Learning & behavior.

[13]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Stefanie Tellex,et al.  The Human Speechome Project , 2006, EELC.

[15]  Terry Winograd,et al.  Shifting viewpoints: Artificial intelligence and human-computer interaction , 2006, Artif. Intell..

[16]  S. Harnad Symbol grounding problem , 1991, Scholarpedia.

[17]  Jeremy Hayward,et al.  The Republic: Plato , 2007 .

[18]  Linda B. Smith,et al.  What's in View for Toddlers? Using a Head Camera to Study Visual Experience. , 2008, Infancy : the official journal of the International Society on Infant Studies.

[19]  M. Tomasello Origins of human communication , 2008 .

[20]  R. Byrne Animal imitation , 2009, Current Biology.

[21]  J. Duncan How Intelligence Happens , 2010 .

[22]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[23]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[24]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[25]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[26]  A. Tate A measure of intelligence , 2012 .

[27]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[36]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[39]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[40]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[41]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[42]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[43]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[45]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[46]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[47]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[48]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[49]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[50]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[52]  Chuang Gan,et al.  TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Hinrich Schütze,et al.  Extending Machine Language Models toward Human-Level Language Understanding , 2019, ArXiv.

[54]  Felix Hill,et al.  Robust Instruction-Following in a Situated Agent via Transfer-Learning from Text , 2019 .

[55]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[56]  Richard Zemel,et al.  A Divergence Minimization Perspective on Imitation Learning Methods , 2019, CoRL.

[57]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[58]  Olivier Pietquin,et al.  Observational Learning by Reinforcement Learning , 2017, AAMAS.

[59]  Zihang Dai,et al.  Wiki-40B: Multilingual Language Model Dataset , 2020, LREC.

[60]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[61]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[62]  Misha Denil,et al.  Task-Relevant Adversarial Imitation Learning , 2019, CoRL.

[63]  Oleg O. Sushkov,et al.  Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[64]  Simon Carter,et al.  Using Unity to Help Solve Intelligence , 2020, ArXiv.

[65]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[66]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67]  Pierre Sermanet,et al.  Grounding Language in Play , 2020, ArXiv.

[68]  James L. McClelland,et al.  Environmental drivers of systematicity and generalization in a situated agent , 2019, ICLR.

[69]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[70]  Brenden M. Lake,et al.  Word meaning in minds and machines , 2020, Psychological review.

[71]  Michael C. Frank,et al.  SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded From the Infant’s Perspective , 2020, Open Mind.