Visual Navigation for Biped Humanoid Robots Using Deep Reinforcement Learning

In this letter, we propose a map-less visual navigation system for biped humanoid robots, which extracts information from color images to derive motion commands using deep reinforcement learning (DRL). The map-less visual navigation policy is trained using the Deep Deterministic Policy Gradients (DDPG) algorithm, which corresponds to an actor-critic DRL algorithm. The algorithm is implemented using two separate networks, one for the actor and one for the critic, but with similar structures. In addition to convolutional and fully connected layers, Long Short-Term Memory (LSTM) layers are included to address the limited observability present in the problem. As a proof of concept, we consider the case of robotic soccer using humanoid NAO V5 robots, which have reduced computational capabilities, and low-cost Red - Green - Blue (RGB) cameras as main sensors. The use of DRL allowed to obtain a complex and high performant policy from scratch, without any prior knowledge of the domain, or the dynamics involved. The visual navigation policy is trained in a robotic simulator and then successfully transferred to a physical robot, where it is able to run in 20 ms, allowing its use in real-time applications.

[1]  Kenzo Lobos-Tsunekawa,et al.  Toward Real-Time Decentralized Reinforcement Learning using Finite Support Basis Functions , 2017, RoboCup.

[2]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[3]  Eric P. Xing,et al.  Unsupervised Real-to-Virtual Domain Unification for End-to-End Highway Driving , 2018, ArXiv.

[4]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[5]  Cewu Lu,et al.  Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[6]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[7]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Nicolás Cruz,et al.  Using Convolutional Neural Networks in Robots with Limited Computational Resources: Detecting NAO Robots while Playing Soccer , 2017, RoboCup.

[10]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[11]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[12]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[13]  Javier Ruiz-del-Solar,et al.  Ball Dribbling for Humanoid Biped Robots: A Reinforcement Learning and Fuzzy Control Approach , 2014, RoboCup.

[14]  Sen Wang,et al.  Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning , 2017, RSS 2017.

[15]  Javier Ruiz-del-Solar,et al.  The NAO Backpack: An Open-hardware Add-on for Fast Software Development with the NAO Robot , 2017, ArXiv.

[16]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[17]  Ming Liu,et al.  Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots , 2016, ArXiv.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Ming Liu,et al.  A deep-network solution towards model-less obstacle avoidance , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[21]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[22]  Stephanie Rosenthal,et al.  Obstacle Avoidance through Deep Networks based Intermediate Perception , 2017, ArXiv.

[23]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[24]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[25]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Javier Ruiz-del-Solar,et al.  A Dynamic and Efficient Active Vision System for Humanoid Soccer Robots , 2015, RoboCup.