论文信息 - VILD: Variational Imitation Learning with Diverse-quality Demonstrations

VILD: Variational Imitation Learning with Diverse-quality Demonstrations

The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL method called \underline{v}ariational \underline{i}mitation \underline{l}earning with \underline{d}iverse-quality demonstrations (VILD), where we explicitly model the level of demonstrators' expertise with a probabilistic graphical model and estimate it along with a reward function. We show that a naive approach to estimation is not suitable to large state and action spaces, and fix its issues by using a variational approach which can be easily implemented using existing reinforcement learning methods. Experiments on continuous-control benchmarks demonstrate that VILD outperforms state-of-the-art methods. Our work enables scalable and data-efficient IL under more realistic settings than before.

[1] Bryan Reimer,et al. MIT Autonomous Vehicle Technology Study: Large-Scale Deep Learning Based Analysis of Driver Behavior and Interaction with Automation , 2017 .

[2] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[3] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[4] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[5] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[6] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.

[7] E. Ionides. Truncated Importance Sampling , 2008 .

[8] R. J. van Beers,et al. The role of execution noise in movement variability. , 2004, Journal of neurophysiology.

[9] Jonathan Scholz,et al. Generative predecessor models for sample-efficient imitation learning , 2019, ICLR.

[10] Masashi Sugiyama,et al. Imitation Learning from Imperfect Demonstration , 2019, ICML.

[11] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.

[12] Gerardo Hermosillo,et al. Learning From Crowds , 2010, J. Mach. Learn. Res..

[13] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[14] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[15] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[16] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.

[17] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[18] David M. Blei,et al. Stochastic Structured Variational Inference , 2014, AISTATS.

[19] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[20] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[21] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[22] Stuart J. Russell. Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[23] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[24] Silvio Savarese,et al. ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[25] David Silver,et al. Learning Autonomous Driving Styles and Maneuvers from Expert Demonstration , 2012, ISER.

[26] Nando de Freitas,et al. Robust Imitation of Diverse Behaviors , 2017, NIPS.

[27] Stefano Ermon,et al. InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[29] Nagarajan Natarajan,et al. Learning with Noisy Labels , 2013, NIPS.

[30] Alex M. Andrew,et al. ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[31] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[32] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[33] Gaurav S. Sukhatme,et al. Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[34] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[35] G. Uhlenbeck,et al. On the Theory of the Brownian Motion , 1930 .

[36] Kyungjae Lee,et al. Inverse reinforcement learning with leveraged Gaussian processes , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[38] Xingrui Yu,et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[39] Prabhat Nagarajan,et al. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[40] Shimon Whiteson,et al. Inverse Reinforcement Learning from Failure , 2016, AAMAS.

[41] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[42] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.

[43] Anima Anandkumar,et al. Learning From Noisy Singly-labeled Data , 2017, ICLR.

[44] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[45] Sergey Levine,et al. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[46] Alessandro Lazaric,et al. Maximum Entropy Semi-Supervised Inverse Reinforcement Learning , 2015, IJCAI.

[47] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[48] Dana Angluin,et al. Learning from noisy examples , 1988, Machine Learning.

[49] Christian P. Robert,et al. Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[50] Sergey Levine,et al. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.