Pushing it out of the Way: Interactive Visual Navigation

We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the environment beyond just moving forward/backward and turning left/right. Sometimes, the best way to navigate is to push something out of the way. In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. To this end, we introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent’s actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities. More specifically, we consider two downstream tasks in the physics-enabled, visually rich, AI2-THOR environment: (1) reaching a target while the path to the target is blocked (2) moving an object to a target location by pushing it. For both tasks, agents equipped with an NIE significantly outperform agents without the understanding of the effect of the actions indicating the benefits of our approach. The code and dataset are available at github.com/KuoHaoZeng/Interactive_Visual_Navigation.

[1]  Roozbeh Mottaghi,et al.  AllenAct: A Framework for Embodied AI Research , 2020, ArXiv.

[2]  David Hsu,et al.  Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties , 2018, Robotics: Science and Systems.

[3]  Tamim Asfour,et al.  Predicting Pushing Action Effects on Spatial Object Relations by Learning Internal Prediction Models , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[4]  James J. Kuffner,et al.  Navigation among movable obstacles: real-time reasoning in complex environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[5]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[6]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[7]  Ali Farhadi,et al.  Visual Reaction: Learning to Play Catch With Your Drone , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ruslan Salakhutdinov,et al.  Learning to Explore using Active Neural SLAM , 2020, ICLR.

[9]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[10]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Liang Zheng,et al.  Learning Object Relation Graph and Tentative Policy for Visual Navigation , 2020, ECCV.

[12]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[13]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[14]  Andrew Jaegle,et al.  Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban , 2020, ArXiv.

[15]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[16]  Dieter Fox,et al.  SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Silvio Savarese,et al.  Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2020, IEEE Robotics and Automation Letters.

[18]  Wolfram Burgard,et al.  Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[20]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[21]  Ruslan Salakhutdinov,et al.  Object Goal Navigation using Goal-Oriented Semantic Exploration , 2020, NeurIPS.

[22]  Ali Farhadi,et al.  Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Daniel L. K. Yamins,et al.  Visual Grounding of Learned Physical Models , 2020, ICML.

[24]  S. Levine,et al.  Predictive Visual Models of Physics for Playing Billiards , 2015 .

[25]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[26]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[27]  James J. Kuffner,et al.  Planning Among Movable Obstacles with Artificial Constraints , 2008, WAFR.

[28]  Rémi Munos,et al.  Neural Predictive Belief Representations , 2018, ArXiv.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Jiajun Wu,et al.  DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions , 2019, Robotics: Science and Systems.

[31]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Andrew Jaegle,et al.  Physically Embedded Planning Problems: New Challenges for Reinforcement Learning , 2020, ArXiv.

[33]  Roozbeh Mottaghi,et al.  Rearrangement: A Challenge for Embodied AI , 2020, ArXiv.

[34]  Jiajun Wu,et al.  Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.

[35]  Silvio Savarese,et al.  ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[36]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[37]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[38]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Juan Carlos Niebles,et al.  Visual Forecasting by Imitating Dynamics in Natural Sequences , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Ankush Gupta,et al.  Unsupervised Learning of Object Keypoints for Perception and Control , 2019, NeurIPS.

[41]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Roozbeh Mottaghi,et al.  ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects , 2020, ArXiv.