Improving Generalization of Transfer Learning Across Domains Using Spatio-Temporal Features in Autonomous Driving

Training vision-based autonomous driving in the real world can be inefficient and impractical. Vehicle simulation can be used to learn in the virtual world, and the acquired skills can be transferred to handle real-world scenarios more effectively. Between virtual and real visual domains, common features such as relative distance to road edges and other vehicles over time are consistent. These visual elements are intuitively crucial for human decision making during driving. We hypothesize that these spatio-temporal factors can also be used in transfer learning to improve generalization across domains. First, we propose a CNN+LSTM transfer learning framework to extract the spatio-temporal features representing vehicle dynamics from scenes. Next, we conduct an ablation study to quantitatively estimate the significance of various features in the decisions of driving systems. We observe that physically interpretable factors are highly correlated with network decisions, while representational differences between scenes are not. Finally, based on the results of our ablation study, we propose a transfer learning pipeline that uses saliency maps and physical features extracted from a source model to enhance the performance of a target model. Training of our network is initialized with the learned weights from CNN and LSTM latent features (capturing the intrinsic physics of the moving vehicle w.r.t. its surroundings) transferred from one domain to another. Our experiments show that this proposed transfer learning framework better generalizes across unseen domains compared to a baseline CNN model on a binary classification learning task.

[1]  Cewu Lu,et al.  Virtual to Real Reinforcement Learning for Autonomous Driving , 2017, BMVC.

[2]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Wengang Zhou,et al.  Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection for Autonomous Driving , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[6]  Tobi Delbrück,et al.  DDD17: End-To-End DAVIS Driving Dataset , 2017, ArXiv.

[7]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Wenjun Zeng,et al.  Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.

[10]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Krzysztof Czarnecki,et al.  Canadian Adverse Driving Conditions dataset , 2020, Int. J. Robotics Res..

[12]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[15]  Ramesh Nallapati,et al.  A Comparison of Methods for Transductive Transfer Learning , 2007 .

[16]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[17]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Daniel Szafir,et al.  Virtual-to-Real-World Transfer Learning for Robots on Wilderness Trails , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[20]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[21]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yi Xiao,et al.  Multimodal End-to-End Autonomous Driving , 2019, IEEE Transactions on Intelligent Transportation Systems.

[23]  Ming Yang,et al.  Conditional Generative Adversarial Network for Structured Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[25]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[27]  Xinming Huang,et al.  End-to-end learning for lane keeping of self-driving cars , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Ming Liu,et al.  A gaze model improves autonomous driving , 2019, ETRA.

[30]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[31]  Yohannes Kassahun,et al.  A2D2: Audi Autonomous Driving Dataset , 2020, ArXiv.

[32]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Samuel Labi,et al.  A deep learning algorithm for simulating autonomous driving considering prior knowledge and temporal information , 2019, Comput. Aided Civ. Infrastructure Eng..

[35]  Sébastien Fournier,et al.  An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification , 2014, WISE.

[36]  Dongpu Cao,et al.  End-to-End Autonomous Driving: An Angle Branched Network Approach , 2019, IEEE Transactions on Vehicular Technology.

[37]  Ming C. Lin,et al.  Enhanced Transfer Learning for Autonomous Driving with Systematic Accident Simulation , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  David Isele,et al.  Transferring Autonomous Driving Knowledge on Simulated and Real Intersections , 2017, ArXiv.

[39]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[41]  Frank Lindseth,et al.  Autonomous Vehicle Control: End-to-end Learning in Simulated Urban Environments , 2019, NAIS.

[42]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[43]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[44]  Hesham M. Eraqi,et al.  End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies , 2017, ArXiv.

[45]  Henrik I. Christensen,et al.  “Looking at the Right Stuff” – Guided Semantic-Gaze for Autonomous Driving , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ruigang Yang,et al.  The ApolloScape Dataset for Autonomous Driving , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Baigen Cai,et al.  Deep Learning Based Motion Planning For Autonomous Vehicle Using Spatiotemporal LSTM Network , 2018, 2018 Chinese Automation Congress (CAC).

[48]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[50]  David J. Kriegman,et al.  Image to Image Translation for Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Kate Saenko,et al.  Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[53]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).