论文信息 - State Alignment-based Imitation Learning

State Alignment-based Imitation Learning

Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of existing imitation learning methods fail because they focus on the imitation of actions. We propose a novel state alignment-based imitation learning method to train the imitator by following the state sequences in the expert demonstrations as much as possible. The alignment of states comes from both local and global perspectives. We combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings as well as the challenging settings in which the expert and the imitator have different dynamics models.

[1] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3] Huang Xiao,et al. Wasserstein Adversarial Imitation Learning , 2019, ArXiv.

[4] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6] Yisong Yue,et al. Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[7] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[8] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[9] C. Villani. Optimal Transport: Old and New , 2008 .

[10] Sergey Levine,et al. Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[11] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[13] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Michael C. Yip,et al. Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.

[15] Peter Stone,et al. Generative Adversarial Imitation from Observation , 2018, ArXiv.

[16] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[17] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.

[18] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[19] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.

[20] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.

[21] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[22] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23] Alexandros Kalousis,et al. Sample-Efficient Imitation Learning via Generative Adversarial Nets , 2018, AISTATS.

[24] Siddhartha Srinivasa,et al. Imitation Learning as f-Divergence Minimization , 2019, WAFR.

[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.