Multi-robot inverse reinforcement learning under occlusion with interactions

We consider the problem of learning the behavior of multiple mobile robots executing fixed trajectories in a common space and possibly interacting with each other in their execution. The mobile robots are observed by a subject robot from a vantage point from which it can observe a portion of their trajectories only. This problem exhibits wide-ranging applications and the specific application we consider here is that of the subject robot who desires to penetrate a simple perimeter patrol by two interacting robots and reach a goal location. Our approach extends single-agent inverse reinforcement learning (IRL) to a multi-robot setting and partial observability, and models the interaction between the mobile robots as equilibrium behavior. IRL provides weights over the features of the robots' reward functions, thereby allowing us to learn their preferences. Subsequently, we derive a Markov decision process based policy for each other robot. We extend a predominant IRL technique and empirically evaluate its performance in our application setting. We show that our approach in the application setting results in significant improvement in the subject's ability to predict the patroller positions at different points in time with a corresponding increase in its successful penetration rate.

[1]  F. A. Muckler,et al.  On the inverse optimal control problem in manual control systems , 1965 .

[2]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[3]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[4]  Emanuel Todorov,et al.  Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[5]  Manuela M. Veloso,et al.  Teaching multi-robot coordination using demonstration of communication and state sharing , 2008, AAMAS.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Kee-Eung Kim,et al.  Inverse Reinforcement Learning in Partially Observable Environments , 2009, IJCAI.

[8]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[9]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[10]  Oliver Kroemer,et al.  Structured Apprenticeship Learning , 2012, ECML/PKDD.

[11]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[12]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13]  Subramanian Ramamoorthy,et al.  Bayesian interaction shaping: learning to influence strategic interactions in mixed robotic domains , 2013, AAMAS.

[14]  Chrystopher L. Nehaniv,et al.  Correspondence Mapping Induced State and Action Metrics for Robotic Imitation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Sarit Kraus,et al.  Multi-robot perimeter patrol in adversarial settings , 2008, 2008 IEEE International Conference on Robotics and Automation.

[16]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[17]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.