Value Iteration Networks

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

[1]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[2]  Thomas Hofmann,et al.  Predicting Structured Data (Neural Information Processing) , 2007 .

[3]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  Shie Mannor,et al.  Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[5]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[6]  R. Bellman Dynamic programming. , 1957, Science.

[7]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[8]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[9]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[10]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[11]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[12]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[13]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[15]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[18]  Aude Billard,et al.  Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[19]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Kyunghyun Cho,et al.  End-to-End Goal-Driven Web Navigation , 2016, NIPS.

[22]  J. Andrew Bagnell,et al.  Reinforcement Planning: RL for optimal planners , 2012, 2012 IEEE International Conference on Robotics and Automation.

[23]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[24]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  LeCunYann,et al.  Learning Hierarchical Features for Scene Labeling , 2013 .

[31]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[33]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[34]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[36]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[37]  Doina Precup,et al.  Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[38]  Ashutosh Saxena,et al.  Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[39]  Honglak Lee,et al.  Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.

[40]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[41]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[43]  Alborz Geramifard,et al.  Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[45]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[47]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[48]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[49]  Kyunghyun Cho,et al.  WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making , 2016, ArXiv.

[50]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[51]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[52]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[53]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[54]  P.J. Werbos,et al.  Efficient Learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[55]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[56]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[57]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[58]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[59]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[60]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[61]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.