Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervisions, LfO is more practical in leveraging previously inapplicable resources (e.g., videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and expert, if following the modeling approach of GAIL. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

[1]  Fuchun Sun,et al.  Task Transfer by Preference-Based Cost Learning , 2018, Proceedings of the AAAI Conference on Artificial Intelligence.

[2]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[3]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[4]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[5]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[6]  M. Spong,et al.  On adaptive inverse dynamics control of rigid robots , 1990 .

[7]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[8]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[11]  Peter Stone,et al.  Imitation Learning from Video by Leveraging Proprioception , 2019, IJCAI.

[12]  J. Andrew Bagnell,et al.  Agnostic System Identification for Model-Based Reinforcement Learning , 2012, ICML.

[13]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[14]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[15]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[16]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[17]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[18]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[19]  Xi Chen,et al.  Learning From Demonstration in the Wild , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[21]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[23]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[24]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[25]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[26]  Gordon Cheng,et al.  Learning tasks from observation and practice , 2004, Robotics Auton. Syst..

[27]  Byron Boots,et al.  Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.

[28]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[29]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[30]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[31]  Joelle Pineau,et al.  Maximum Mean Discrepancy Imitation Learning , 2013, Robotics: Science and Systems.

[32]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[33]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[35]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[36]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[37]  Kee-Eung Kim,et al.  Imitation Learning via Kernel Mean Embedding , 2018, AAAI.

[38]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[39]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[40]  Jiashi Feng,et al.  Policy Optimization with Demonstrations , 2018, ICML.

[41]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[42]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.