Inverse reinforcement learning of behavioral models for online-adapting navigation strategies

To increase the acceptance of autonomous systems in populated environments, it is indispensable to teach them social behavior. We would expect a social robot, which plans its motions among humans, to consider both the social acceptability of its behavior as well as task constraints, such as time limits. These requirements are often contradictory and therefore resulting in a trade-off. For example, a robot has to decide whether it is more important to quickly achieve its goal or to comply with social conventions, such as the proximity to humans, i.e., the robot has to react adaptively to task-specific priorities. In this paper, we present a method for priority-adaptive navigation of mobile autonomous systems, which optimizes the social acceptability of the behavior while meeting task constraints. We learn acceptability-dependent behavioral models from human demonstrations by using maximum entropy (MaxEnt) inverse reinforcement learning (IRL). These models are generative and describe the learned stochastic behavior. We choose the optimum behavioral model by maximizing the social acceptability under constraints on expected time-limits and reliabilities. This approach is evaluated in the context of driving behaviors based on the highway scenario of Levine et al. [1].

[1]  Reid G. Simmons,et al.  GRACE: An Autonomous Robot for the AAAI Robot Challenge , 2003, AI Mag..

[2]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[3]  Jodi Forlizzi,et al.  Social Robot Navigation , 2010 .

[4]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[5]  Dinesh Manocha,et al.  PLEdestrians: a least-effort approach to crowd simulation , 2010, SCA '10.

[6]  Christian Vollmer,et al.  Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[7]  Andreas Krause,et al.  Unfreezing the robot: Navigation in dense, interacting crowds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[9]  Michael Karg,et al.  Increasing perceived value between human and robots — Measuring legibility in human aware navigation , 2012, 2012 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO).

[10]  Kai Oliver Arras,et al.  Socially-aware robot navigation: A learning approach , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Gonzalo Ferrer,et al.  Robot companion: A social-force based approach with human awareness-navigation in crowded environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Adrien Treuille,et al.  Continuum crowds , 2006, ACM Trans. Graph..

[13]  Wolfram Burgard,et al.  Teaching mobile robots to cooperatively navigate in populated environments , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Kai Oliver Arras,et al.  Planning Problems for Social Robots , 2011, ICAPS.

[15]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[16]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[17]  Takayuki Kanda,et al.  How do people walk side-by-side? — Using a computational model of human behavior for a social robot , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[19]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[20]  Anind K. Dey,et al.  Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[21]  Wolfram Burgard,et al.  Socially Inspired Motion Planning for Mobile Robots in Populated Environments , 2008 .

[22]  Dinesh Manocha,et al.  Interactive simulation of dynamic crowd behaviors using general adaptation syndrome theory , 2012, I3D '12.

[23]  Kai Oliver Arras,et al.  Please do not disturb! Minimum interference coverage for social robots , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[25]  Wolfram Burgard,et al.  A navigation system for robots operating in crowded urban environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[28]  Zoran Popović,et al.  Learning behavior styles with inverse reinforcement learning , 2010, SIGGRAPH 2010.