Deep Reinforcement Learning Visual Navigation Model Integrating Memory-prediction Mechanism

Deep reinforcement learning (DRL) has been widely used in the field of visual navigation. However, due to the lack of adaptability of DRL to the new tasks, the generalization ability of current visual navigation model using DRL is not desired. In order to improve this deficiency, we introduce the memory-prediction mechanism. By enhancing the memory of the scene, and combining the past experience of navigation to predict the next state, a more reasonable action can be obtained. First, we pass the image features extracted during the navigation process to an LSTM, and use LSTM to memorize the scene information in the image features. Then, we combine all the information (including state, target, and action) of each time step in the navigation process, and pass the historical information of multiple time steps to another LSTM to predict the next state. The action performed by the robot is determined by the predicted state. We use the AI2-THOR framework to carry out experiments. The results show that the proposed method can improve the navigation performance of the DRL visual navigation model and improve its adaptability to new tasks.

[1]  Thomas S. Collett,et al.  Memory use in insect visual navigation , 2002, Nature Reviews Neuroscience.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Ali Farhadi,et al.  Visual Semantic Navigation using Scene Priors , 2018, ICLR.

[4]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[5]  Ali Farhadi,et al.  Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Qiang Liu,et al.  Learning to Explore with Meta-Policy Gradient , 2018, ICML 2018.

[7]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[8]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[9]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTM for Sentiment Classification , 2016, ACL.

[10]  Yuandong Tian,et al.  Learning and Planning with a Semantic Model , 2018, ArXiv.

[11]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[15]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.