SESNO: Sample Efficient Social Navigation from Observation

In this paper, we present the Sample Efficient Social Navigation from Observation (SESNO) algorithm that efficiently learns socially-compliant navigation policies from observations of human trajectories. SESNO is an inverse reinforcement learning (IRL)-based algorithm that learns from human trajectory observations without knowledge of their actions. We improve the sample-efficiency over previous IRL-based methods by introducing a shared experience replay buffer that allows reuse of past trajectory experiences to estimate the policy and the reward. We evaluate SESNO using publicly available pedestrian motion data sets and compare its performance to related baseline methods in the literature. We show that SESNO yields performance superior to existing baselines while dramatically improving the sample complexity by using as few as a hundredth of the samples required by existing baselines.

[1]  P. Stone,et al.  RAIL: A modular framework for Reinforcement-learning-based Adversarial Imitation Learning , 2021, ArXiv.

[2]  Bobak H. Baghi,et al.  Learning Goal Conditioned Socially Compliant Navigation From Demonstration Using Risk-Based Features , 2021, IEEE Robotics and Automation Letters.

[3]  Fuchun Sun,et al.  Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement , 2019, NeurIPS.

[4]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[5]  Yi Guo,et al.  Learning How Pedestrians Navigate: A Deep Inverse Reinforcement Learning Approach , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Alexandre Alahi,et al.  Crowd-Robot Interaction: Crowd-Aware Robot Navigation With Attention-Based Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Kumar Krishna Agrawal,et al.  Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning , 2018, 1809.02925.

[8]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[9]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[11]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[12]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[14]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[16]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[17]  Kai Oliver Arras,et al.  Learning socially normative robot navigation behaviors with Bayesian inverse reinforcement learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[19]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Kai Oliver Arras,et al.  Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: An experimental comparison , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Julien Diard,et al.  Proxemics models for human-aware navigation in robotics: Grounding interaction and personal space models in experimental data from psychology , 2014 .

[23]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Wolfram Burgard,et al.  Learning to predict trajectories of cooperatively navigating agents , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[26]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[27]  Thomas Bak,et al.  Trajectory planning for robots in dynamic human environments , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[30]  Dinesh Manocha,et al.  Modeling collision avoidance behavior for virtual humans , 2010, AAMAS.

[31]  Brian D. Ziebart,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[32]  Bradford J McFadyen,et al.  Characteristics of personal space during obstacle circumvention in physical and virtual environments. , 2008, Gait & posture.

[33]  Wolfgang Maass,et al.  Efficient Continuous-Time Reinforcement Learning with Adaptive State Graphs , 2007, ECML.

[34]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[35]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[36]  John Travis Butler,et al.  Psychological Effects of Behavior Patterns of a Mobile Personal Robot , 2001, Auton. Robots.

[37]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[39]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[40]  Edward T. Hall,et al.  Handbook for Proxemic Research , 1974 .

[41]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[42]  Joelle Pineau,et al.  Socially Adaptive Path Planning in Human Environments Using Inverse Reinforcement Learning , 2016, Int. J. Soc. Robotics.

[43]  Wolfram Burgard,et al.  Socially Inspired Motion Planning for Mobile Robots in Populated Environments , 2008 .

[44]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.