On Computation and Generalization of Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies. Different from Reinforcement Learning (RL), GAIL takes advantage of demonstration data by experts (e.g., human), and learns both the policy and reward function of the unknown environment. Despite the significant empirical progresses, the theory behind GAIL is still largely unknown. The major difficulty comes from the underlying temporal dependency of the demonstration data and the minimax computational formulation of GAIL without convex-concave structure. To bridge such a gap between theory and practice, this paper investigates the theoretical properties of GAIL. Specifically, we show: (1) For GAIL with general reward parameterization, the generalization can be guaranteed as long as the class of the reward functions is properly controlled; (2) For GAIL, where the reward is parameterized as a reproducing kernel function, GAIL can be efficiently solved by stochastic first order optimization algorithms, which attain sublinear convergence to a stationary solution. To the best of our knowledge, these are the first results on statistical and computational guarantees of imitation learning with reward/policy function ap- proximation. Numerical experiments are provided to support our analysis.

[1]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[2]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[3]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[4]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[5]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[6]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[7]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[10]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[11]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[12]  Mykel J. Kochenderfer,et al.  Imitating driver behavior with generative adversarial networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[13]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[14]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[15]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[16]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[17]  M. Willem Minimax Theorems , 1997 .

[18]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[19]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[20]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[21]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[22]  Luca Bascetta,et al.  Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[25]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[26]  Le Song,et al.  Learning Temporal Point Processes via Reinforcement Learning , 2018, NeurIPS.

[27]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[28]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[29]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[30]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[31]  S. S. Vallender Calculation of the Wasserstein Distance Between Probability Distributions on the Line , 1974 .

[32]  Kee-Eung Kim,et al.  Imitation Learning via Kernel Mean Embedding , 2018, AAAI.

[33]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[34]  Yongxin Chen,et al.  On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator , 2019, ArXiv.

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[37]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[38]  Wolfram Burgard,et al.  Socially Compliant Navigation Through Raw Depth Inputs with Generative Adversarial Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[40]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[41]  Le Song,et al.  Smoothed Dual Embedding Control , 2017, ArXiv.

[42]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[43]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[44]  K Fan,et al.  Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[46]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[47]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[48]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[49]  Liming Xiang,et al.  Kernel-Based Reinforcement Learning , 2006, ICIC.

[50]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[51]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[52]  Le Song,et al.  SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[53]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[54]  Sergey Levine,et al.  Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[55]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[56]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[57]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[58]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[59]  Arkadi Nemirovski,et al.  Robust Convex Optimization , 1998, Math. Oper. Res..

[60]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[61]  W. Murray,et al.  A Projected Lagrangian Algorithm for Nonlinear Minimax Optimization , 1980 .

[62]  Andreas Krause,et al.  Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Mapless Navigation by Leveraging Prior Demonstrations , 2018, IEEE Robotics and Automation Letters.

[63]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.