An end-to-end inverse reinforcement learning by a boosting approach with relative entropy

Abstract Inverse reinforcement learning (IRL) involves imitating expert behaviors by recovering reward functions from demonstrations. This study proposes a model-free IRL algorithm to solve the dilemma of predicting the unknown reward function. The proposed end-to-end model comprises a dual structure of autoencoders in parallel. The model uses a state encoding method to reduce the computational complexity for high-dimensional environments and utilizes an Adaboost classifier to determine the difference between the predicted and demonstrated reward functions. Relative entropy is used as a metric to measure the difference between the demonstrated and the imitated behavior. The simulation experiments demonstrate the effectiveness of the proposed method in terms of the number of iterations that are required for the estimation.

[1]  Matthieu Geist,et al.  Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Wolfram Burgard,et al.  Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics , 2016, AISTATS.

[3]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[4]  Jun Yu,et al.  Hierarchical Deep Click Feature Prediction for Fine-Grained Image Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  András György,et al.  Online Learning in Markov Decision Processes with Changing Cost Sequences , 2014, ICML.

[6]  Doina Precup,et al.  Metrics for Finite Markov Decision Processes , 2004, AAAI.

[7]  Shie Mannor,et al.  Online learning in Markov decision processes with arbitrarily changing rewards and transitions , 2009, 2009 International Conference on Game Theory for Networks.

[8]  Cheng Deng,et al.  Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Kao-Shing Hwang,et al.  End-to-End Navigation Strategy With Deep Reinforcement Learning for Mobile Robots , 2020, IEEE Transactions on Industrial Informatics.

[10]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[11]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[12]  Jun Yu,et al.  Local Deep-Feature Alignment for Unsupervised Dimension Reduction , 2018, IEEE Transactions on Image Processing.

[13]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[14]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[15]  Quan Pan,et al.  Disentangled Variational Auto-Encoder for Semi-supervised Learning , 2017, Inf. Sci..

[16]  Kee-Eung Kim,et al.  Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions , 2012, NIPS.

[17]  Sergey Levine,et al.  Feature Construction for Inverse Reinforcement Learning , 2010, NIPS.

[18]  Kao-Shing Hwang,et al.  An ensemble method for inverse reinforcement learning , 2020, Inf. Sci..

[19]  Jun Yu,et al.  Multimodal Face-Pose Estimation With Multitask Manifold Deep Learning , 2019, IEEE Transactions on Industrial Informatics.

[20]  Chi-Hyuck Jun,et al.  Instance categorization by support vector machines to adjust weights in AdaBoost for imbalanced data classification , 2017, Inf. Sci..

[21]  Guoming Tang,et al.  Argumentation based reinforcement learning for meta-knowledge extraction , 2020, Inf. Sci..

[22]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[23]  Joel W. Burdick,et al.  Inverse Reinforcement Learning via Function Approximation for Clinical Motion Analysis , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[25]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.