论文信息 - f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning

Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.

[1] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[2] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[3] Igor Vajda,et al. On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[4] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[5] Katia Sycara,et al. Loss-annealed GAIL for sample efficient and stable Imitation Learning , 2020, ArXiv.

[6] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[7] Lantao Yu,et al. Training Deep Energy-Based Models with f-Divergence Minimization , 2020, ICML.

[8] Siddhartha Srinivasa,et al. Imitation Learning as f-Divergence Minimization , 2019, WAFR.

[9] Asli Çelikyilmaz,et al. Reparameterized Variational Divergence Minimization for Stable Imitation , 2020, ArXiv.

[10] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11] M. C. Jones,et al. A reliable data-based bandwidth selection method for kernel density estimation , 1991 .

[12] László Györfi,et al. Distribution estimation consistent in total variation and in two types of information divergence , 1992, IEEE Trans. Inf. Theory.

[13] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[14] Imre Csiszár,et al. Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[15] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[17] Sergey Levine,et al. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[18] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[19] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[20] F. L. Hitchcock. The Distribution of a Product from Several Sources to Numerous Localities , 1941 .

[21] Richard Zemel,et al. A Divergence Minimization Perspective on Imitation Learning Methods , 2019, CoRL.

[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .

[25] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[26] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[27] Luca Ambrogioni,et al. Wasserstein Variational Inference , 2018, NeurIPS.

[28] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[29] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[30] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[31] Leon Hirsch,et al. Fundamentals Of Convex Analysis , 2016 .