Visual Semantic Planning Using Deep Successor Representations

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of actions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects and their affordances, as well as actions and their preconditions and effects. We propose learning these through interacting with a visual and dynamic environment. Our proposed solution involves bootstrapping reinforcement learning with imitation learning. To ensure cross task generalization, we develop a deep predictive model based on successor representations. Our experimental results show near optimal results across a wide range of tasks in the challenging THOR environment.

[1]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[4]  A. Noë,et al.  On the brain-basis of visual consciousness: A sensorimotor account , 2002 .

[5]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[6]  Michael L. Anderson Embodied Cognition: A field guide , 2003, Artif. Intell..

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[9]  Stefan Edelkamp,et al.  Automated Planning: Theory and Practice , 2007, Künstliche Intell..

[10]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[11]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[12]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[13]  Bernhard Nebel,et al.  Integrating symbolic and geometric planning for mobile manipulation , 2009, 2009 IEEE International Workshop on Safety, Security & Rescue Robotics (SSRR 2009).

[14]  David Vázquez,et al.  Learning appearance in virtual scenarios for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[16]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[17]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[19]  Pieter Abbeel,et al.  Using Classical Planners for Tasks with Continuous Operators in Robotics , 2013 .

[20]  James M. Rehg,et al.  Modeling Actions through State Changes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[22]  James J. Gibson,et al.  The Ecological Approach to Visual Perception: Classic Edition , 2014 .

[23]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[25]  Markus Schoeler,et al.  Semantic Pose Using Deep Networks Trained on Synthetic RGB-D , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[30]  Javier R. Movellan,et al.  Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning , 2015, BMVC.

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[32]  Samuel Gershman,et al.  Deep Successor Reinforcement Learning , 2016, ArXiv.

[33]  James J. Little,et al.  Play and Learn: Using Video Games to Train Computer Vision Models , 2016, BMVC.

[34]  Roberto Cipolla,et al.  Understanding RealWorld Indoor Scenes with Synthetic Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[36]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[37]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[38]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[39]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[40]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[41]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[42]  Ali Farhadi,et al.  Actions ~ Transformations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[45]  Mario Fritz,et al.  To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction , 2016, ArXiv.

[46]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[47]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[48]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[49]  Ali Farhadi,et al.  "What Happens If..." Learning to Predict the Effect of Forces in Images , 2016, ECCV.

[50]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Stefano Ermon,et al.  Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs , 2017, NIPS 2017.

[52]  Misha Denil,et al.  Learning to Perform Physics Experiments via Deep Reinforcement Learning , 2016, ICLR.

[53]  Vladlen Koltun,et al.  Learning to Act by Predicting the Future , 2016, ICLR.

[54]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[56]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).