Learning reliable and efficient navigation with a humanoid

Reliable and efficient navigation with a humanoid robot is a difficult task. First, the motion commands are executed rather inaccurately due to backlash in the joints or foot slippage. Second, the observations are typically highly affected by noise due to the shaking behavior of the robot. Thus, the localization performance is typically reduced while the robot moves and the uncertainty about its pose increases. As a result, the reliable and efficient execution of a navigation task cannot be ensured anymore since the robot's pose estimate might not correspond to the true location. In this paper, we present a reinforcement learning approach to select appropriate navigation actions for a humanoid robot equipped with a camera for localization. The robot learns to reach the destination reliably and as fast as possible, thereby choosing actions to account for motion drift and trading off velocity in terms of fast walking movements against accuracy in localization. We present extensive simulated and practical experiments with a humanoid robot and demonstrate that our learned policy significantly outperforms a hand-optimized navigation strategy.

[1]  Wolfram Burgard,et al.  Which landmark is useful? Learning selection policies for navigation in unknown environments , 2009, 2009 IEEE International Conference on Robotics and Automation.

[2]  Sven Behnke,et al.  Stochastic optimization of bipedal walking using gyro feedback and phase resetting , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[3]  Oskar von Stryk,et al.  International Journal of Robotics Research , 2022 .

[4]  Cord Niehaus,et al.  Gait Optimization on a Humanoid Robot using Particle Swarm Optimization , 2007 .

[5]  Nicholas Roy,et al.  Using reinforcement learning to improve exploration trajectories for error minimization , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[6]  Salah Sukkarieh,et al.  Active airborne localisation and exploration in unknown environments using inertial SLAM , 2006, 2006 IEEE Aerospace Conference.

[7]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[8]  Tsukasa Ogasawara,et al.  Indoor Navigation for a Humanoid Robot Using a View Sequence , 2009, Int. J. Robotics Res..

[9]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[10]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[11]  Nicholas Roy,et al.  Planning in information space for a quadrotor helicopter in a GPS-denied environment , 2008, 2008 IEEE International Conference on Robotics and Automation.

[12]  Qiang Huang,et al.  Generation of humanoid walking pattern based on human walking measurement , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[13]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[14]  Nicholas Roy,et al.  icLQG: Combining local and global optimization for control in information space , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  Wolfram Burgard,et al.  Learning efficient policies for vision-based navigation , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Jürgen Schmidhuber,et al.  Fast Online Q(λ) , 1998, Machine Learning.

[17]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[18]  Wolfram Burgard,et al.  A visual odometry framework robust to motion blur , 2009, 2009 IEEE International Conference on Robotics and Automation.

[19]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[20]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[21]  Rajesh P. N. Rao,et al.  Learning to Walk through Imitation , 2007, IJCAI.

[22]  Wolfram Burgard,et al.  Metric Localization with Scale-Invariant Visual Features Using a Single Perspective Camera , 2006, EUROS.

[23]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  Günther Schmidt,et al.  Intelligent gaze control for vision-guided humanoid walking: methodological aspects , 2004, Robotics Auton. Syst..