How Do Drivers Allocate Their Potential Attention? Driving Fixation Prediction via Convolutional Neural Networks

The traffic driving environment is a complex and dynamic changing scene in which drivers have to pay close attention to salient and important targets or regions for safe driving. Modeling drivers’ eye movements and attention allocation in traffic driving can also help guiding unmanned intelligent vehicles. However, until now, few studies have modeled drivers’ true fixations and allocations while driving. To this end, we collect an eye tracking dataset from a total of 28 experienced drivers viewing 16 traffic driving videos. Based on the multiple drivers’ attention allocation dataset, we propose a convolutional-deconvolutional neural network (CDNN) to predict the drivers’ eye fixations. The experimental results indicate that the proposed CDNN outperforms the state-of-the-art saliency models and predicts drivers’ attentional locations more accurately. The proposed CDNN can predict the major fixation location and shows excellent detection of secondary important information or regions that cannot be ignored during driving if they exist. Compared with the present object detection models in autonomous and assisted driving systems, our human-like driving model does not detect all of the objects appearing in the driving scenes, but it provides the most relevant regions or targets, which can largely reduce the interference of irrelevant scene information.

[1]  Lynn Hasher,et al.  Cultural differences in visual attention: Implications for distraction processing , 2017, British journal of psychology.

[2]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Martin D. Levine,et al.  Visual Saliency Based on Scale-Space Analysis in the Frequency Domain , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[6]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ali Borji,et al.  What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[10]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[11]  Fernando De la Torre,et al.  Driver Gaze Tracking and Eyes Off the Road Detection System , 2015, IEEE Transactions on Intelligent Transportation Systems.

[12]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Hongmei Yan,et al.  Learning to Boost Bottom-Up Fixation Prediction in Driving Environments via Random Forest , 2018, IEEE Transactions on Intelligent Transportation Systems.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Markus Enzweiler,et al.  Will this car change the lane? - Turn signal recognition in the frequency domain , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[18]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[19]  Alex Fridman,et al.  Driver Gaze Region Estimation without Use of Eye Movement , 2015, IEEE Intelligent Systems.

[20]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Rita Cucchiara,et al.  A deep multi-level network for saliency prediction , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[22]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[23]  Andrea Palazzi,et al.  DR(eye)VE: A Dataset for Attention-Based Tasks with Applications to Autonomous and Assisted Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[25]  Mathias Perrollaz,et al.  Learning-based approach for online lane change intention prediction , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[26]  Nicolas Pugeault,et al.  How Much of Driving Is Preattentive? , 2015, IEEE Transactions on Vehicular Technology.

[27]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[28]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[30]  Matthias Bethge,et al.  Information-theoretic model comparison unifies saliency metrics , 2015, Proceedings of the National Academy of Sciences.

[31]  Byeongkeun Kang,et al.  A computational framework for driver's visual attention using a fully convolutional architecture , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[32]  Mohan M. Trivedi,et al.  Robust and continuous estimation of driver gaze zone by dynamic analysis of multiple face videos , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[33]  Rita Cucchiara,et al.  POSEidon: Face-from-Depth for Driver Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Tao Deng,et al.  Top-down based saliency model in traffic driving environment , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[37]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[38]  Jean-Philippe Tarel,et al.  Alerting the drivers about road signs with poor visual saliency , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[39]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Cristian Sminchisescu,et al.  Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[42]  Mohan M. Trivedi,et al.  Where is the driver looking: Analysis of head, eye and iris for robust gaze zone estimation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[43]  Mohan M. Trivedi,et al.  Continuous Head Movement Estimator for Driver Assistance: Issues, Algorithms, and On-Road Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[44]  Tao Deng,et al.  Where Does the Driver Look? Top-Down-Based Saliency Detection in a Traffic Driving Environment , 2016, IEEE Transactions on Intelligent Transportation Systems.

[45]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.