Role Playing Learning for Socially Concomitant Mobile Robot Navigation

In this paper, we present the Role Playing Learning (RPL) scheme for a mobile robot to navigate socially with its human companion in populated environments. Neural networks (NN) are constructed to parameterize a stochastic policy that directly maps sensory data collected by the robot to its velocity outputs, while respecting a set of social norms. An efficient simulative learning environment is built with maps and pedestrians trajectories collected from a number of real-world crowd data sets. In each learning iteration, a robot equipped with the NN policy is created virtually in the learning environment to play itself as a companied pedestrian and navigate towards a goal in a socially concomitant manner. Thus, we call this process Role Playing Learning, which is formulated under a reinforcement learning (RL) framework. The NN policy is optimized end-to-end using Trust Region Policy Optimization (TRPO), with consideration of the imperfectness of robot's sensor measurements. Simulative and experimental results are provided to demonstrate the efficacy and superiority of our method.

[1]  Wolfram Burgard,et al.  Socially Inspired Motion Planning for Mobile Robots in Populated Environments , 2008 .

[2]  Todd D. Murphey,et al.  Optimal planning for target localization and coverage using range sensing , 2015, 2015 IEEE International Conference on Automation Science and Engineering (CASE).

[3]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[4]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[5]  Dirk Helbing,et al.  Specification of the Social Force Pedestrian Model by Evolutionary Adjustment to Video Tracking Data , 2007, Adv. Complex Syst..

[6]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Panos E. Trahanias,et al.  Probabilistic Autonomous Robot Navigation in Dynamic Environments with Human Motion Prediction , 2010, Int. J. Soc. Robotics.

[8]  Dinesh Manocha,et al.  Reciprocal Velocity Obstacles for real-time multi-agent navigation , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[10]  Christian Vollmer,et al.  Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[11]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[12]  Paolo Fiorini,et al.  Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[13]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[14]  Kee-Eung Kim,et al.  MAP Inference for Bayesian Inverse Reinforcement Learning , 2011, NIPS.

[15]  Jun-Sik Kim,et al.  Extrinsic Calibration of 2-D Lidars Using Two Orthogonal Planes , 2016, IEEE Transactions on Robotics.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Irfan A. Essa,et al.  Gaussian process regression flow for analysis of motion trajectories , 2011, 2011 International Conference on Computer Vision.

[18]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[20]  Andreas Krause,et al.  Unfreezing the robot: Navigation in dense, interacting crowds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Antonio Bicchi,et al.  Towards a Society of Robots , 2010, IEEE Robotics & Automation Magazine.

[22]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[23]  Dirk Helbing,et al.  Pedestrian, Crowd and Evacuation Dynamics , 2013, Encyclopedia of Complexity and Systems Science.

[24]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[25]  Shuzhi Sam Ge,et al.  New potential functions for mobile robot path planning , 2000, IEEE Trans. Robotics Autom..

[26]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[27]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[28]  Andreas Krause,et al.  Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation , 2015, Int. J. Robotics Res..

[29]  Pieter Abbeel,et al.  LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , 2010, Int. J. Robotics Res..

[30]  Horst-Michael Groß,et al.  Progress in developing a socially assistive mobile home robot companion for the elderly with mild cognitive impairment , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Joelle Pineau,et al.  Person tracking and following with 2D laser scanners , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[33]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[34]  Roland Siegwart,et al.  Topological Mapping and Scene Recognition With Lightweight Color Descriptors for an Omnidirectional Camera , 2014, IEEE Transactions on Robotics.

[35]  Dhanvin Mehta,et al.  Autonomous navigation in dynamic social environments using Multi-Policy Decision Making , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[37]  Huanran Wang,et al.  Adaptive Shared Control for a Novel Mobile Assistive Robot , 2014, IEEE/ASME Transactions on Mechatronics.

[38]  Roland Siegwart,et al.  Characterization of the compact Hokuyo URG-04LX 2D laser range scanner , 2009, 2009 IEEE International Conference on Robotics and Automation.

[39]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[40]  Narendra Ahuja,et al.  A potential field approach to path planning , 1992, IEEE Trans. Robotics Autom..

[41]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[42]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[44]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[45]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[46]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Carme Torras,et al.  Object modeling using a ToF camera under an uncertainty reduction approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[48]  John L. Nazareth,et al.  Conjugate-Gradient Methods , 2009, Encyclopedia of Optimization.

[49]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Ivan Petrovic,et al.  Dynamic window based approach to mobile robot motion control in the presence of moving obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[51]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[52]  J. Andrew Bagnell,et al.  Efficient high dimensional maximum entropy modeling via symmetric partition functions , 2012, NIPS.

[53]  Dirk Helbing,et al.  Simulating dynamical features of escape panic , 2000, Nature.