Learning Human Search Behavior from Egocentric Visual Inputs

"Looking for things" is a mundane but critical task we repeatedly carry on in our daily life. We introduce a method to develop a human character capable of searching for a randomly located target object in a detailed 3D scene using its locomotion capability and egocentric vision perception represented as RGBD images. By depriving the privileged 3D information from the human character, it is forced to move and look around simultaneously to account for the restricted sensing capability, resulting in natural navigation and search behaviors. Our method consists of two components: 1) a search control policy based on an abstract character model, and 2) an online replanning control module for synthesizing detailed kinematic motion based on the trajectories planned by the search policy. We demonstrate that the combined techniques enable the character to effectively find often occluded household items in indoor environments. The same search policy can be applied to different full-body characters without the need for retraining. We evaluate our method quantitatively by testing it on randomly generated scenarios. Our work is a first step toward creating intelligent virtual agents with humanlike behaviors driven by onboard sensors, paving the road toward future robotic applications.

[1]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[2]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[3]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[4]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[5]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[6]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[7]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[8]  Jessica K. Hodgins,et al.  Constraint-based motion optimization using a statistical dynamic model , 2007, ACM Trans. Graph..

[9]  Tao Zhou,et al.  Deep learning of biomimetic sensorimotor control for biomechanical human animation , 2018, ACM Trans. Graph..

[10]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Dinesh K. Pai,et al.  Eyecatch: simulating visuomotor coordination for object interception , 2012, ACM Trans. Graph..

[12]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[13]  N. Heess,et al.  Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[14]  Silvio Savarese,et al.  Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH '08.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[18]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[20]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[21]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[22]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[23]  Jungdam Won,et al.  SoftCon: simulation and control of soft-bodied animals with biomimetic actuators , 2019, ACM Trans. Graph..

[24]  Yuval Tassa,et al.  Catch & Carry , 2020, ACM Trans. Graph..

[25]  Jessica K. Hodgins,et al.  Performance animation from low-dimensional control signals , 2005, SIGGRAPH 2005.

[26]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[27]  Jungdam Won,et al.  How to train your dragon , 2017, ACM Trans. Graph..

[28]  Vui Ann Shim,et al.  Automatic object searching by a mobile robot with single RGB-D camera , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[29]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[30]  C. Karen Liu,et al.  Learning to dress , 2018, ACM Trans. Graph..

[31]  Silvio Savarese,et al.  Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2020, IEEE Robotics and Automation Letters.

[32]  Dani Lischinski,et al.  Learning character-agnostic motion for motion retargeting in 2D , 2019, ACM Trans. Graph..

[33]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[34]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[35]  S. Levine,et al.  DeepMimic , 2018, ACM Transactions on Graphics.

[36]  NohJunyong,et al.  Model Predictive Control with a Visuomotor System for Physics-based Character Animation , 2020 .

[37]  Zoran Popović,et al.  Motion fields for interactive character locomotion , 2010, SIGGRAPH 2010.

[38]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[40]  Daniel King,et al.  Fetch & Freight : Standard Platforms for Service Robot Applications , 2016 .

[41]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[43]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[44]  Sebastian Starke,et al.  Neural state machine for character-scene interactions , 2019, ACM Trans. Graph..

[45]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Fuchun Sun,et al.  Automatic Object Searching and Behavior Learning for Mobile Robots in Unstructured Environment by Deep Belief Networks , 2019, IEEE Transactions on Cognitive and Developmental Systems.

[47]  James J. Kuffner,et al.  Perception-Based Navigation for Animated Characters in Real-Time Virtual Environments , 1999 .

[48]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[49]  Sehoon Ha,et al.  Zero-shot Imitation Learning from Demonstrations for Legged Robot Visual Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.