Visual Perspective Taking for Opponent Behavior Modeling

In order to engage in complex social interaction, humans learn at a young age to infer what others see and cannot see from a different point-of-view, and learn to predict others’ plans and behaviors. These abilities have been mostly lacking in robots, sometimes making them appear awkward and socially inept. Here we propose an end-to-end long-term visual prediction framework for robots to begin to acquire both these critical cognitive skills, known as Visual Perspective Taking (VPT) and Theory of Behavior (TOB). We demonstrate our approach in the context of visual hide-and-seek – a game that represents a cognitive milestone in human development. Unlike traditional visual predictive model that generates new frames from immediate past frames, our agent can directly predict to multiple future timestamps (25 s), extrapolating by 175% beyond the training horizon. We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities.

[1]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[2]  Boyuan Chen,et al.  Visual Hide and Seek , 2019, ALIFE.

[3]  Andrew N Meltzoff,et al.  How does it look? Level 2 perspective-taking at 36 months of age. , 2011, Child development.

[4]  Ali Farhadi,et al.  Two Body Problem: Collaborative Visual Task Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Richard L. Lewis,et al.  A new approach to exploring language emergence as boundedly optimal control in the face of environmental and cognitive constraints , 2010 .

[6]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[7]  Walter F. Bischof,et al.  Perspective taking and theory of mind in hide and seek , 2018, Attention, perception & psychophysics.

[8]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[9]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[11]  Ali Farhadi,et al.  Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game , 2019, ArXiv.

[12]  Raul Vicente,et al.  Perspective Taking in Deep Reinforcement Learning Agents , 2020, Frontiers in Computational Neuroscience.

[13]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[14]  G Seeja,et al.  A Survey on Swarm Robotic Modeling, Analysis and Hardware Architecture , 2018 .

[15]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[16]  Hod Lipson,et al.  Visual behavior modelling for robotic theory of mind , 2021, Scientific reports.

[17]  Paul Newman,et al.  The Right (Angled) Perspective: Improving the Understanding of Road Scenes Using Boosted Inverse Perspective Mapping , 2018, 2019 IEEE Intelligent Vehicles Symposium (IV).

[18]  Silvio Savarese,et al.  Deep Visual MPC-Policy Learning for Navigation , 2019, IEEE Robotics and Automation Letters.

[19]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[20]  J. Flavell,et al.  Young children's knowledge about visual perception: Effect of observer's distance from target on perceptual clarity of target. , 1980 .

[21]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[22]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[23]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[24]  Katsu Yamane,et al.  VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation , 2020, RSS 2020.

[25]  John Folkesson,et al.  Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[28]  Szymon Rusinkiewicz,et al.  Spatial Action Maps for Mobile Manipulation , 2020, Robotics: Science and Systems.

[29]  Jean Piaget,et al.  Child's Conception of Space: Selected Works vol 4 , 1998 .

[30]  Peter Stone,et al.  Autonomous agents modelling other agents: A comprehensive survey and open problems , 2017, Artif. Intell..

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  J. Gregory Trafton,et al.  Enabling effective human-robot interaction using perspective-taking in robots , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[33]  J. Flavell,et al.  The development of knowledge about visual perception. , 1977, Nebraska Symposium on Motivation. Nebraska Symposium on Motivation.

[34]  Dilek Z. Hakkani-Tür,et al.  FollowNet: Robot Navigation by Following Natural Language Directions with Deep Reinforcement Learning , 2018, ArXiv.

[35]  Katja Hofmann,et al.  A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games , 2016, ICLR 2016.

[36]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[37]  Wei Gao,et al.  Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation , 2017, CoRL.

[38]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[39]  Morten H. Christiansen,et al.  On The Evolutionary Origin of Symbolic Communication , 2016, Scientific Reports.

[40]  Danielle Ropar,et al.  A review of visual perspective taking in autism spectrum disorder , 2013, Front. Hum. Neurosci..

[41]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[43]  M. Brecht,et al.  Behavioral and neural correlates of hide-and-seek in rats , 2019, Science.

[44]  J. Gregory Trafton,et al.  Children and robots learning to play hide and seek , 2006, HRI '06.

[45]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[46]  J. Huttenlocher,et al.  Children's Early Ability to Solve Perspective-Taking Problems. , 1992 .

[47]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[48]  Vijay Kumar,et al.  A Survey on Aerial Swarm Robotics , 2018, IEEE Transactions on Robotics.