论文信息 - Probability Density Estimation Based Imitation Learning - 字舞流文

Probability Density Estimation Based Imitation Learning

Imitation Learning (IL) is an effective learning paradigm exploiting the interactions between agents and environments. It does not require explicit reward signals and instead tries to recover desired policies using expert demonstrations. In general, IL methods can be categorized into Behavioral Cloning (BC) and Inverse Reinforcement Learning (IRL). In this work, a novel reward function based on probability density estimation is proposed for IRL, which can significantly reduce the complexity of existing IRL methods. Furthermore, we prove that the theoretically optimal policy derived from our reward function is identical to the expert policy as long as it is deterministic. Consequently, an IRL problem can be gracefully transformed into a probability density estimation problem. Based on the proposed reward function, we present a “watchtry-learn” style framework named Probability Density Estimation based Imitation Learning (PDEIL), which can work in both discrete and continuous action spaces. Finally, comprehensive experiments in the Gym environment show that PDEIL is much more efficient than existing algorithms in recovering rewards close to the ground truth.

Bin Liang | Yang Liu | Yongzhe Chang | Xueqian Wang | Bo Yuan | Shilei Jiang | Xueqian Wang | Bin Liang | Bo Yuan | Yongzhe Chang | Shilei Jiang | Yang Liu

[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[2] Mikael Henaff,et al. Disagreement-Regularized Imitation Learning , 2020, ICLR.

[3] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[4] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[5] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[6] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[7] Yiannis Demiris,et al. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation , 2019, ICML.

[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[9] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10] Manfred K. Warmuth. Proceedings of the seventh annual conference on Computational learning theory , 1994, COLT 1994.

[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12] Carla E. Brodley,et al. Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.

[13] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[14] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15] Matthieu Geist,et al. Primal Wasserstein Imitation Learning , 2020, ICLR.

[16] Patrick J. Roa. Volume 8 , 2001 .

[17] M. V. Rossum,et al. In Neural Computation , 2022 .

[18] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[19] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[20] Anca D. Dragan,et al. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards , 2019, ICLR.

[21] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.