Columba: A New Approach to Train an Agent for Autonomous Driving

For autonomous driving in extremely complex scenarios, existing research utilizes deep reinforcement learning or imitation learning to obtain the decision-making capability of agents. However, due to the incomplete information nature of such driving scenarios, existing techniques usually suffer issues such as the incorrect rewards or unstable training which would impact the learning quality seriously. In this paper, we propose a new approach named Columba which trains the agent to learn from expert trajectory data and abnormal trajectory data instead of relying on any manually-set reward functions. In particular, Columba designs a positive and negative feedback regulator to reduce the dangerous or bad states of the car agent at the beginning of training. Further, Columba generates the rewards by coordinating with the discriminator, the random distillation network and the regulator, enhancing the accuracy of rewards. We conduct extensive experiments on the Torcs simulation platform. Experimental results show that the agent trained by Columba outperforms the agents trained by DDPG and GAIL, which are strong baselines in the deep reinforcement learning and the imitation learning, respectively.