Learning How Pedestrians Navigate: A Deep Inverse Reinforcement Learning Approach

Humans and mobile robots will be increasingly cohabiting in the same environments, which has lead to an increase in studies on human robot interaction (HRI). One important topic in these studies is the development of robot navigation algorithms that are socially compliant to humans navigating in the same space. In this paper, we present a method to learn human navigation behaviors using maximum entropy deep inverse reinforcement learning (MEDIRL). We use a large open dataset of pedestrian trajectories collected in an uncontrolled environment as the expert demonstrations. Human navigation behaviors are captured by a nonlinear reward function through deep neural network (DNN) approximation. The developed MEDIRL algorithm takes feature inputs including social affinity map (SAM) that are extracted from human motion trajectories. We perform simulation experiments using the learned reward function, and the performance is evaluated comparing it with the real measured pedestrian trajectories in the dataset. The evaluation results show that the proposed method has acceptable prediction accuracy compared to other state-of-the-art methods, and it can generate pedestrian trajectories similar to real human trajectories with natural social navigation behaviors such as collision avoidance, leader-follower, and split-and-rejoin.

[1]  Kai Oliver Arras,et al.  Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: An experimental comparison , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Horst-Michael Groß,et al.  TOOMAS: Interactive Shopping Guide robots in everyday use - final implementation and experiences from long-term field trials , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Rachid Alami,et al.  Human-aware robot navigation: A survey , 2013, Robotics Auton. Syst..

[5]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[6]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dushyant Rao,et al.  Large-scale cost function learning for path planning using deep inverse reinforcement learning , 2017, Int. J. Robotics Res..

[10]  D. Helbing,et al.  The Walking Behaviour of Pedestrian Social Groups and Its Impact on Crowd Dynamics , 2010, PloS one.

[11]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[12]  H. Huttenrauch,et al.  Fetch-and-carry with CERO: observations from a long-term user study with a service robot , 2002, Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication.

[13]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[14]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[15]  Takayuki Kanda,et al.  Person Tracking in Large Public Spaces Using 3-D Range Sensors , 2013, IEEE Transactions on Human-Machine Systems.

[16]  Marc Hanheide,et al.  Analysis of human-robot spatial behaviour applying a qualitative trajectory calculus , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[17]  Sebastian Thrun,et al.  Apprenticeship learning for motion planning with application to parking lot navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Markus Wulfmeier,et al.  Deep Inverse Reinforcement Learning , 2015, ArXiv.

[19]  Jan A Snyman,et al.  Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms , 2005 .

[20]  Pieter Abbeel,et al.  Inverse Reinforcement Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[21]  Evangelos A. Theodorou,et al.  Inverse Reinforcement Learning with PI 2 , 2010 .

[22]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Christian Vollmer,et al.  Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[24]  Gonzalo Ferrer,et al.  Behavior estimation for a complete framework for human motion prediction in crowded environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[26]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[27]  P. Molnár Social Force Model for Pedestrian Dynamics Typeset Using Revt E X 1 , 1995 .

[28]  Andreas Krause,et al.  Unfreezing the robot: Navigation in dense, interacting crowds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Gonzalo Ferrer,et al.  Robot companion: A social-force based approach with human awareness-navigation in crowded environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.