Utilising Visual Attention Cues for Vehicle Detection and Tracking

Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset.

[1]  James M. Rehg,et al.  Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency , 2018, ECCV.

[2]  Qi Hu,et al.  A box-particle implementation of standard PHD filter for extended target tracking , 2017, Inf. Fusion.

[3]  Ming-Hsuan Yang,et al.  PiCANet: Learning Pixel-Wise Contextual Attention for Saliency Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Dietrich Paulus,et al.  Confidence-Aware Probability Hypothesis Density Filter for Visual Multi-Object Tracking , 2017, VISIGRAPP.

[5]  Ting Zhao,et al.  Pyramid Feature Attention Network for Saliency Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Silvio Savarese,et al.  Data-driven 3D Voxel Patterns for object category recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[8]  Ronald P. S. Mahler,et al.  Advances in Statistical Multisource-Multitarget Information Fusion , 2014 .

[9]  Emilio Maggio,et al.  Particle PHD Filtering for Multi-Target Visual Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Noel E. O'Connor,et al.  SalGAN: Visual Saliency Prediction with Generative Adversarial Networks , 2017, ArXiv.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[13]  Lei Zhang,et al.  Foreground Gating and Background Refining Network for Surveillance Object Detection , 2019, IEEE Transactions on Image Processing.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Ba-Ngu Vo,et al.  The GM-PHD Filter Multiple Target Tracker , 2006, 2006 9th International Conference on Information Fusion.

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Tao Deng,et al.  Where Does the Driver Look? Top-Down-Based Saliency Detection in a Traffic Driving Environment , 2016, IEEE Transactions on Intelligent Transportation Systems.

[19]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[20]  Sangarapillai Lambotharan,et al.  Kalman-Gain Aided Particle PHD Filter for Multitarget Tracking , 2017, IEEE Transactions on Aerospace and Electronic Systems.

[21]  Ming-Hsuan Yang,et al.  UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking , 2015, Comput. Vis. Image Underst..

[22]  Defu Jiang,et al.  Particle-gating SMC-PHD filter , 2017, Signal Process..

[23]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[24]  Y. Bar-Shalom,et al.  Probability hypothesis density filter for multitarget multisensor tracking , 2005, 2005 7th International Conference on Information Fusion.

[25]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Fuchun Sun,et al.  RON: Reverse Connection with Objectness Prior Networks for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jonathon A. Chambers,et al.  Adaptive Retrodiction Particle PHD Filter for Multiple Human Tracking , 2016, IEEE Signal Processing Letters.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Sumeetpal S. Singh,et al.  Sequential monte carlo implementation of the phd filter for multi-target tracking , 2003, Sixth International Conference of Information Fusion, 2003. Proceedings of the.

[31]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[32]  Noel E. O'Connor,et al.  Saliency Weighted Convolutional Features for Instance Search , 2017, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[33]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Fan Yang,et al.  Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andrew M. Wallace,et al.  Development of a N-type GM-PHD Filter for Multiple Target, Multiple Type Visual Tracking , 2019, J. Vis. Commun. Image Represent..