Semi-Supervised Imitation Learning with Mixed Qualities of Demonstrations for Autonomous Driving

In this paper, we consider the problem of autonomous driving using imitation learning in a semi-supervised manner. In particular, both labeled and unlabeled demonstrations are leveraged during training by estimating the quality of each unlabeled demonstration. If the provided demonstrations are corrupted and have a low signal-to-noise ratio, the performance of the imitation learning agent can be degraded significantly. To mitigate this problem, we propose a method called semi-supervised imitation learning (SSIL). SSIL first learns how to discriminate and evaluate each state-action pair’s reliability in unlabeled demonstrations by assigning higher reliability values to demonstrations similar to labeled expert demonstrations. This reliability value is called leverage. After this discrimination process, both labeled and unlabeled demonstrations with estimated leverage values are utilized while training the policy in a semi-supervised manner. The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities. Moreover, the hardware experiments using an RC car are conducted to show that the proposed method can be applied to real-world applications.

[1]  Huang Xiao,et al.  Wasserstein Adversarial Imitation Learning , 2019, ArXiv.

[2]  Kyungjae Lee,et al.  Scalable robust learning from demonstration with leveraged deep neural networks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Kyungjae Lee,et al.  Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots , 2020, RSS 2020.

[4]  Kyungjae Lee,et al.  Robust learning from demonstration using leveraged Gaussian processes and sparse-constrained optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[6]  Songhwai Oh,et al.  Robust Learning From Demonstrations With Mixed Qualities Using Leveraged Gaussian Processes , 2019, IEEE Transactions on Robotics.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Kyungjae Lee,et al.  Uncertainty-Aware Learning from Demonstration Using Mixture Density Networks with Sampling-Free Variance Modeling , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  Masayoshi Tomizuka,et al.  Model-free Deep Reinforcement Learning for Urban Autonomous Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[12]  Johan Jeuring,et al.  Building a Generic Feedback System for Rule-Based Problems , 2016, TFP.

[13]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[14]  Kyungjae Lee,et al.  Maximum Causal Tsallis Entropy Imitation Learning , 2018, NeurIPS.

[15]  Daniele Loiacono,et al.  Simulated Car Racing Championship: Competition Software Manual , 2013, ArXiv.

[16]  Claude Sammut Behavioral Cloning , 2010, Encyclopedia of Machine Learning.

[17]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[18]  Songhwai Oh,et al.  MixGAIL: Autonomous Driving Using Demonstrations with Mixed Qualities , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Calin Belta,et al.  Rule-based optimal control for autonomous driving , 2021, ICCPS.

[20]  Xiong Luo,et al.  Rule-Based Human Motion Tracking for Rehabilitation Exercises: Realtime Assessment, Feedback, and Guidance , 2017, IEEE Access.

[21]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[22]  S. Srihari Mixture Density Networks , 1994 .

[23]  Silvio Savarese,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[24]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[25]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.