Scaling Expectation-Maximization for Inverse Reinforcement Learning to Multiple Robots under Occlusion

We consider inverse reinforcement learning (IRL) when portions of the expert's trajectory are occluded from the learner. For example, two experts performing tasks in close proximity may block each other from the learner's view or the learner is a robot observing mobile robots from a fixed position with limited sensor range. Previous methods mitigate this challenge by either focusing on the observed data only or by forming an expectation over the missing portion of the expert's trajectories given observed data. However, not only is the resulting optimization nonlinear and nonconvex, the space of occluded trajectories may be very large especially when multiple agents are observed over an extended time, which makes it intractable to compute the expectation. We present methods for speeding up the computation of conditional expectations by employing blocked Gibbs sampling. Challenged by a time-limited, multi-robot domain we explore various blocking schemes and demonstrate that our methods offer significantly improved performance over existing IRL techniques under occlusion.

[1]  F. A. Muckler,et al.  On the inverse optimal control problem in manual control systems , 1965 .

[2]  Prashant Doshi,et al.  Multi-robot inverse reinforcement learning under occlusion with interactions , 2014, AAMAS.

[3]  Emanuel Todorov,et al.  Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[4]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[5]  Uffe Kjærulff,et al.  Blocking Gibbs sampling in very large probabilistic expert systems , 1995, Int. J. Hum. Comput. Stud..

[6]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7]  Kian Hsiang Low,et al.  Inverse Reinforcement Learning with Locally Consistent Reward Functions , 2015, NIPS.

[8]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[9]  Ryo Kurazume,et al.  Robust motion capture system against target occlusion using fast level set method , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[13]  Dale Schuurmans,et al.  The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.

[14]  Prashant Doshi,et al.  Toward Estimating Others' Transition Models Under Occlusion for Multi-Robot IRL , 2015, IJCAI.

[15]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[16]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[17]  Avi Pfeffer,et al.  Discovering Weakly-Interacting Factors in a Complex Stochastic Process , 2007, NIPS.

[18]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[19]  Oliver Kroemer,et al.  Structured Apprenticeship Learning , 2012, ECML/PKDD.

[20]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[21]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[22]  Bodo Rosenhahn,et al.  Dealing with Self-occlusion in Region Based Motion Capture by Means of Internal Regions , 2008, AMDO.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Percy Liang,et al.  Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[25]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[26]  Dana Kulic,et al.  Expectation-Maximization for Inverse Reinforcement Learning with Hidden Data , 2016, AAMAS.

[27]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[28]  Matthieu Geist,et al.  Inverse Reinforcement Learning through Structured Classification , 2012, NIPS.

[29]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.