论文信息 - Value Iteration Networks

Value Iteration Networks

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

[1] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[2] Thomas Hofmann,et al. Predicting Structured Data (Neural Information Processing) , 2007 .

[3] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4] Shie Mannor,et al. Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations , 2014, ICML.

[5] John Langford,et al. Learning to Search Better than Your Teacher , 2015, ICML.

[6] R. Bellman. Dynamic programming. , 1957, Science.

[7] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[8] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[9] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[10] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[11] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[12] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[13] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[15] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[16] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[18] Aude Billard,et al. Dynamical System Modulation for Robot Learning via Kinesthetic Demonstrations , 2008, IEEE Transactions on Robotics.

[19] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[21] Kyunghyun Cho,et al. End-to-End Goal-Driven Web Navigation , 2016, NIPS.

[22] J. Andrew Bagnell,et al. Reinforcement Planning: RL for optimal planners , 2012, 2012 IEEE International Conference on Robotics and Automation.

[23] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.

[24] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[26] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30] LeCunYann,et al. Learning Hierarchical Features for Scene Labeling , 2013 .

[31] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[32] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[33] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[34] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35] Wojciech Zaremba,et al. Learning Simple Algorithms from Examples , 2015, ICML.

[36] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[37] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[38] Ashutosh Saxena,et al. Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[39] Honglak Lee,et al. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.

[40] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[41] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[42] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[43] Alborz Geramifard,et al. Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[45] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[47] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[48] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[49] Kyunghyun Cho,et al. WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making , 2016, ArXiv.

[50] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[51] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[52] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[53] Sebastian Nowozin,et al. Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[54] P.J. Werbos,et al. Efficient Learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[55] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[56] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[57] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[58] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[59] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[60] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[61] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.