ID-YOLO: Real-Time Salient Object Detection Based on the Driver’s Fixation Region

Object detection is an important task for self-driving vehicles or advanced driver assistant systems (ADASs). Additionally, visual selective attention is a crucial neural mechanism in a driver’s vision system that can rapidly filter out unnecessary visual information in a driving scene. Some existing models detect all objects in driving scenes from the aspect of computer vision. However, in a rapidly changing driving environment, detecting salient or critical objects appearing in drivers’ interested or safety-relevant areas is more useful for ADASs. In this paper, we managed to detect salient and critical objects based on drivers’ fixation regions. To this end, we built an augmented eye tracking object detection (ETOD) dataset based on driving videos with multiple drivers’ eye movement collected by Deng et al. Furthermore, we proposed a real-time salient object detection network named increase-decrease YOLO (ID-YOLO) to discriminate the critical objects within the drivers’ fixation region. The proposed ID-YOLO shows excellent detection of major objects that drivers are concerned about during driving. Compared with the present object detection models in autonomous and assisted driving systems, our object detection framework simulates the selective attention mechanism of drivers. Thus, it does not detect all of the objects appearing in the driving scenes but only detects the most relevant ones for driving safety. It can largely reduce the interference of irrelevant scene information, showing potential practical applications in intelligent or assisted driving systems.

[1]  Kuk-Jin Yoon,et al.  Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianru Xue,et al.  DADA: Driver Attention Prediction in Driving Accident Scenarios , 2019, IEEE Transactions on Intelligent Transportation Systems.

[3]  Fei Yan,et al.  Driving Video Fixation Prediction Model Via Spatio-Temporal Networks and Attention Gates , 2021, 2021 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Haibin Ling,et al.  Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[6]  Ying Wang,et al.  VarifocalNet: An IoU-aware Dense Object Detector , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Noel E. O'Connor,et al.  Utilising Visual Attention Cues for Vehicle Detection and Tracking , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[8]  Vicente Ordonez,et al.  MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning , 2019, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Ling Shao,et al.  Motion-Aware Rapid Video Saliency Detection , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Ruigang Yang,et al.  Inferring Salient Objects from Human Fixations , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jun Li,et al.  Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection , 2020, NeurIPS.

[12]  B. S. Manjunath,et al.  How Do Drivers Allocate Their Potential Attention? Driving Fixation Prediction via Convolutional Neural Networks , 2020, IEEE Transactions on Intelligent Transportation Systems.

[13]  Xilin Chen,et al.  Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training , 2020, ECCV.

[14]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[16]  Hanqiu Sun,et al.  Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks , 2020, IEEE Transactions on Image Processing.

[17]  Jiashi Feng,et al.  Distilling Object Detectors With Fine-Grained Feature Imitation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ling Shao,et al.  Video Saliency Detection Using Object Proposals , 2018, IEEE Transactions on Cybernetics.

[23]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[24]  Hongmei Yan,et al.  Learning to Boost Bottom-Up Fixation Prediction in Driving Environments via Random Forest , 2018, IEEE Transactions on Intelligent Transportation Systems.

[25]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[26]  Ali Borji,et al.  Revisiting Video Saliency: A Large-Scale Benchmark and a New Model , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  David Whitney,et al.  Predicting Driver Attention in Critical Situations , 2017, ACCV.

[28]  Wenguan Wang,et al.  Deep Visual Attention Prediction , 2017, IEEE Transactions on Image Processing.

[29]  Ling Shao,et al.  Video Salient Object Detection via Fully Convolutional Networks , 2017, IEEE Transactions on Image Processing.

[30]  Ruigang Yang,et al.  Saliency-Aware Video Object Segmentation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[32]  Kwan-Liu Ma,et al.  Stereoscopic Thumbnail Creation via Efficient Stereo Saliency Detection , 2017, IEEE Transactions on Visualization and Computer Graphics.

[33]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ling Shao,et al.  Correspondence Driven Saliency Transfer , 2016, IEEE Transactions on Image Processing.

[36]  Andrea Palazzi,et al.  DR(eye)VE: A Dataset for Attention-Based Tasks with Applications to Autonomous and Assisted Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Tao Deng,et al.  Where Does the Driver Look? Top-Down-Based Saliency Detection in a Traffic Driving Environment , 2016, IEEE Transactions on Intelligent Transportation Systems.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[40]  Alex Fridman,et al.  Driver Gaze Region Estimation without Use of Eye Movement , 2015, IEEE Intelligent Systems.

[41]  Nicolas Pugeault,et al.  How Much of Driving Is Preattentive? , 2015, IEEE Transactions on Vehicular Technology.

[42]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[45]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  Cristian Sminchisescu,et al.  Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Mohan M. Trivedi,et al.  Where is the driver looking: Analysis of head, eye and iris for robust gaze zone estimation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[50]  Matthieu Guillaumin,et al.  Non-maximum Suppression for Object Detection by Passing Messages Between Windows , 2014, ACCV.

[51]  Markus Enzweiler,et al.  Will this car change the lane? - Turn signal recognition in the frequency domain , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[52]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[53]  Mohan M. Trivedi,et al.  Continuous Head Movement Estimator for Driver Assistance: Issues, Algorithms, and On-Road Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[54]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[56]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[58]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[59]  Geoffrey F Woodman,et al.  Serial deployment of attention during visual search. , 2003, Journal of experimental psychology. Human perception and performance.