Socially compliant mobile robot navigation via inverse reinforcement learning

Mobile robots are increasingly populating our human environments. To interact with humans in a socially compliant way, these robots need to understand and comply with mutually accepted rules. In this paper, we present a novel approach to model the cooperative navigation behavior of humans. We model their behavior in terms of a mixture distribution that captures both the discrete navigation decisions, such as going left or going right, as well as the natural variance of human trajectories. Our approach learns the model parameters of this distribution that match, in expectation, the observed behavior in terms of user-defined features. To compute the feature expectations over the resulting high-dimensional continuous distributions, we use Hamiltonian Markov chain Monte Carlo sampling. Furthermore, we rely on a Voronoi graph of the environment to efficiently explore the space of trajectories from the robot’s current position to its target position. Using the proposed model, our method is able to imitate the behavior of pedestrians or, alternatively, to replicate a specific behavior that was taught by tele-operation in the target environment of the robot. We implemented our approach on a real mobile robot and demonstrated that it is able to successfully navigate in an office environment in the presence of humans. An extensive set of experiments suggests that our technique outperforms state-of-the-art methods to model the behavior of pedestrians, which also makes it applicable to fields such as behavioral science or computer graphics.

[1]  Maxim Likhachev,et al.  Efficiently Finding Optimal Winding-Constrained Loops in the Plane: Extended Abstract , 2012, SOCS.

[2]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3]  Wolfram Burgard,et al.  Online generation of homotopically distinct navigation paths , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Joakim Nivre AN EFFICIENT ALGORITHM , 2003 .

[5]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[6]  José del R. Millán,et al.  Brain-Controlled Wheelchairs: A Robotic Architecture , 2013, IEEE Robotics & Automation Magazine.

[7]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[8]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[9]  Nikos G. Tsagarakis,et al.  Proceedings of Robotics: Science and Systems IX , 2013 .

[10]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[11]  Rachid Alami,et al.  A framework for adapting social conventions in a mobile robot motion in human-centered environment , 2009, 2009 International Conference on Advanced Robotics.

[12]  Gustavo Arechavaleta Servin An optimality principle governing human walking , 2007 .

[13]  Benjamin Kuipers,et al.  A framework for planning comfortable and customizable motion of an assistive mobile robot , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Vijay Kumar,et al.  Topological constraints in search-based robot path planning , 2012, Auton. Robots.

[15]  Serge P. Hoogendoorn,et al.  Simulation of pedestrian flows by optimal control and differential games , 2003 .

[16]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[17]  Christian Mandel,et al.  Comparison of Wheelchair User Interfaces for the Paralysed: Head-Joystick vs. Verbal Path Selection from an offered Route-Set , 2007, EMCR.

[18]  Y. Shiota,et al.  The mobile robot which passes a man , 1997, Proceedings 6th IEEE International Workshop on Robot and Human Communication. RO-MAN'97 SENDAI.

[19]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[20]  Dirk Helbing,et al.  Pedestrian, Crowd and Evacuation Dynamics , 2013, Encyclopedia of Complexity and Systems Science.

[21]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[23]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[24]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[25]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Jean-Paul Laumond,et al.  From human to humanoid locomotion—an inverse optimal control approach , 2010, Auton. Robots.

[29]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[30]  Y. Shiota,et al.  Analysis of human avoidance motion for application to robot , 1996, Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA.

[31]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[32]  Wolfram Burgard,et al.  Learning to predict trajectories of cooperatively navigating agents , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[33]  A. Berthoz,et al.  A comparison of human trajectory planning models for implementation on humanoid robot , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[34]  Wolfram Burgard,et al.  Socially Inspired Motion Planning for Mobile Robots in Populated Environments , 2008 .

[35]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[36]  Nicholas Roy,et al.  Efficiently Finding Optimal Winding-Constrained Loops in the Plane , 2013 .

[37]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[38]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[39]  J. Andrew Bagnell,et al.  Efficient high dimensional maximum entropy modeling via symmetric partition functions , 2012, NIPS.

[40]  Michael Buro,et al.  Efficient Triangulation-Based Pathfinding , 2006, AAAI.

[41]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[42]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[43]  Andreas Krause,et al.  Robot navigation in dense human crowds: the case for cooperation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[44]  R. Simmons,et al.  COMPANION: A Constraint-Optimizing Method for Person-Acceptable Navigation , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[45]  W. H. Warren The dynamics of perception and action. , 2006, Psychological review.

[46]  Paolo Fiorini,et al.  Motion Planning in Dynamic Environments Using Velocity Obstacles , 1998, Int. J. Robotics Res..

[47]  Howie Choset,et al.  Sensor-Based Exploration: The Hierarchical Generalized Voronoi Graph , 2000, Int. J. Robotics Res..

[48]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[49]  R. D. Rosenkrantz,et al.  Where Do We Stand on Maximum Entropy? (1978) , 1989 .

[50]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[51]  Dinesh Manocha,et al.  A statistical similarity measure for aggregate crowd dynamics , 2012, ACM Trans. Graph..

[52]  Andreas Krause,et al.  Unfreezing the robot: Navigation in dense, interacting crowds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[53]  Wolfram Burgard,et al.  Teaching mobile robots to cooperatively navigate in populated environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[55]  Dirk Helbing,et al.  Specification of the Social Force Pedestrian Model by Evolutionary Adjustment to Video Tracking Data , 2007, Adv. Complex Syst..

[56]  Wolfram Burgard,et al.  Online generation of kinodynamic trajectories for non-circular omnidirectional robots , 2011, 2011 IEEE International Conference on Robotics and Automation.

[57]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[58]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[59]  Henrik I. Christensen,et al.  Embodied social interaction for robots , 2005 .

[60]  Jean-Paul Laumond,et al.  The formation of trajectories during goal‐oriented locomotion in humans. II. A maximum smoothness model , 2007, The European journal of neuroscience.

[61]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[62]  J. March Introduction to the Calculus of Variations , 1999 .

[63]  E. Hall,et al.  The Hidden Dimension , 1970 .

[64]  Anind K. Dey,et al.  Probabilistic pointing target prediction via inverse optimal control , 2012, IUI '12.

[65]  Stephen Bitgood,et al.  Not Another Step! Economy of Movement and Pedestrian Choice Point Behavior in Shopping Malls , 2006 .

[66]  Toshihide Ibaraki,et al.  An efficient algorithm for K shortest simple paths , 1982, Networks.

[67]  Dinesh Manocha,et al.  Modeling collision avoidance behavior for virtual humans , 2010, AAMAS.