SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving

The perception of the environment plays a decisive role in the safe and secure operation of autonomous vehicles. The perception of the surrounding is way similar to human vision. The human’s brain perceives the environment by utilizing different sensory channels and develop a view-invariant representation model. In this context, different exteroceptive sensors like cameras, Lidar, are deployed on the autonomous vehicle to perceive the environment. These sensors have illustrated their benefit in the visible spectrum domain yet in the adverse weather conditions; for instance, they have limited operational capability at night, leading to fatal accidents. This work explores thermal object detection to model a view-invariant model representation by employing the self-supervised contrastive learning approach. We have proposed a deep neural network Self Supervised Thermal Network (SSTN) for learning the feature embedding to maximize the information between visible and infrared spectrum domain by contrastive learning. Later, these learned feature representations are employed for thermal object detection using a multi-scale encoder-decoder transformer network. The proposed method is extensively evaluated on the two publicly available datasets: the FLIR-ADAS dataset and the KAIST Multi-Spectral dataset. The experimental results illustrate the efficacy of the proposed method.

[1]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[2]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Homayoun Najjaran,et al.  Autonomous vehicle perception: The technology of today and tomorrow , 2018 .

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Namil Kim,et al.  Multispectral pedestrian detection: Benchmark dataset and baseline , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Inbar Mosseri,et al.  XGAN: Unsupervised Image-to-Image Translation for many-to-many Mappings , 2017, Domain Adaptation for Visual Understanding.

[8]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[9]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[12]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[13]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Michael Gasser,et al.  The Development of Embodied Cognition: Six Lessons from Babies , 2005, Artificial Life.

[15]  Floris P. de Lange,et al.  How Prediction Errors Shape Perception, Attention, and Motivation , 2012, Front. Psychology.

[16]  Jianyi Liu,et al.  P$^2$-GAN: Efficient Style Transfer Using Single Style Image , 2020, ArXiv.

[17]  J. Hohwy The Predictive Mind , 2013 .

[18]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[19]  Xiaohua Zhai,et al.  Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[23]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[24]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[27]  Vineeth N Balasubramanian,et al.  Borrow From Anywhere: Pseudo Multi-Modal Object Detection in Thermal Imagery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Di Wang,et al.  Unsupervised Domain Adaptation for Object Detection via Cross-Domain Semi-Supervised Learning , 2019, ArXiv.

[29]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Xu Ji,et al.  Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Miran Pobar,et al.  Thermal Object Detection in Difficult Weather Conditions Using YOLO , 2020, IEEE Access.

[32]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[33]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[34]  Kshitij Agrawal,et al.  Enhancing Object Detection in Adverse Conditions using Thermal Imaging , 2019, ArXiv.

[35]  Debasmita Ghose,et al.  Pedestrian Detection in Thermal Images Using Saliency Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[36]  Euntai Kim,et al.  Efficient Pedestrian Detection at Nighttime Using a Thermal Camera , 2017, Sensors.

[37]  Pedro J. Navarro,et al.  A Systematic Review of Perception System and Simulators for Autonomous Vehicles Research , 2019, Sensors.

[38]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[39]  Tor Arne Johansen,et al.  Object detection, recognition, and tracking from UAVs using a thermal camera , 2020, J. Field Robotics.

[40]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[41]  P. V. S. S. R. Chandra Mouli,et al.  Adaptive Pedestrian Detection in Infrared Images Using Background Subtraction and Local Thresholding , 2015 .

[42]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[43]  Wei Li,et al.  An effective approach to pedestrian detection in thermal imagery , 2012, 2012 8th International Conference on Natural Computation.

[44]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.