Transfer Value Iteration Networks

Value iteration networks (VINs) have been demonstrated to have a good generalization ability for reinforcement learning tasks across similar domains. However, based on our experiments, a policy learned by VINs still fail to generalize well on the domain whose action space and feature space are not identical to those in the domain where it is trained. In this paper, we propose a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such that a learned policy from a source domain can be generalized to a target domain with only limited training data, even if the source domain and the target domain have domain-specific actions and features. We empirically verify that our proposed TVINs outperform VINs when the source and the target domains have similar but not identical action and feature spaces. Furthermore, we show that the performance improvement is consistent across different environments, maze sizes, dataset sizes as well as different values of hyperparameters such as number of iteration and kernel size.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[3]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[4]  Fuzhen Zhuang,et al.  Supervised Representation Learning: Transfer Learning with Deep Autoencoders , 2015, IJCAI.

[5]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[6]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[7]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8]  Subbarao Kambhampati,et al.  Extracting Action Sequences from Texts Based on Deep Reinforcement Learning , 2018, IJCAI.

[9]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[10]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[11]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[12]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Subbarao Kambhampati,et al.  Discovering Underlying Plans Based on Distributed Representations of Actions , 2016, AAMAS.

[17]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[18]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[19]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[22]  Michael L. Littman,et al.  Policy and Value Transfer in Lifelong Reinforcement Learning , 2018, ICML.

[23]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[24]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[25]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[26]  Eric P. Xing,et al.  Gated Path Planning Networks , 2018, ICML.

[27]  Qiang Yang,et al.  Cross-Domain Action-Model Acquisition for Planning via Web Search , 2011, ICAPS.

[28]  Qiang Yang,et al.  Action-model acquisition for planning via transfer learning , 2014, Artif. Intell..

[29]  Kyunghyun Cho,et al.  End-to-End Goal-Driven Web Navigation , 2016, NIPS.

[30]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[31]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[32]  Tom Schaul,et al.  Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.

[33]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.