Generalized Value Iteration Networks: Life Beyond Lattices

In this paper, we introduce a generalized value iteration network (GVIN), which is an end-to-end neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Q-learning, an improvement upon traditional n-step Q-learning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and real-world street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image).

[1]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[2]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[5]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[6]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[8]  Xavier Bresson,et al.  Structured Sequence Modeling with Graph Convolutional Recurrent Networks , 2016, ICONIP.

[9]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[10]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Jack Bresenham,et al.  A linear algorithm for incremental digital display of circular arcs , 1977, CACM.

[13]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[14]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[15]  David K. Smith,et al.  Dynamic Programming and Optimal Control. Volume 1 , 1996 .

[16]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[17]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[18]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[19]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[20]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[21]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[24]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[27]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[28]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.