High-Resolution Neural Network for Driver Visual Attention Prediction

Driving is a task that puts heavy demands on visual information, thereby the human visual system plays a critical role in making proper decisions for safe driving. Understanding a driver’s visual attention and relevant behavior information is a challenging but essential task in advanced driver-assistance systems (ADAS) and efficient autonomous vehicles (AV). Specifically, robust prediction of a driver’s attention from images could be a crucial key to assist intelligent vehicle systems where a self-driving car is required to move safely interacting with the surrounding environment. Thus, in this paper, we investigate a human driver’s visual behavior in terms of computer vision to estimate the driver’s attention locations in images. First, we show that feature representations at high resolution improves visual attention prediction accuracy and localization performance when being fused with features at low-resolution. To demonstrate this, we employ a deep convolutional neural network framework that learns and extracts feature representations at multiple resolutions. In particular, the network maintains the feature representation with the highest resolution at the original image resolution. Second, attention prediction tends to be biased toward centers of images when neural networks are trained using typical visual attention datasets. To avoid overfitting to the center-biased solution, the network is trained using diverse regions of images. Finally, the experimental results verify that our proposed framework improves the prediction accuracy of a driver’s attention locations.

[1]  Dong Liu,et al.  High-Resolution Representations for Labeling Pixels and Regions , 2019, ArXiv.

[2]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[4]  Neil D. B. Bruce,et al.  A Deeper Look at Saliency: Feature Contrast, Semantics, and Beyond , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Byeongkeun Kang,et al.  A computational framework for driver's visual attention using a fully convolutional architecture , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[9]  Cynthia Owsley,et al.  Vision and driving , 2010, Vision Research.

[10]  Yifan Zhang,et al.  Spatial Attention Fusion for Obstacle Detection Using MmWave Radar and Vision Sensor , 2020, Sensors.

[11]  Jooyoung Park,et al.  Prediction of Driver’s Intention of Lane Change by Augmenting Sensor Information Using Machine Learning Techniques , 2017, Sensors.

[12]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  A. Leon-Garcia,et al.  Probability, statistics, and random processes for electrical engineering , 2008 .

[16]  David Whitney,et al.  Predicting Driver Attention in Critical Situations , 2017, ACCV.

[17]  Lester C. Loschky,et al.  Scene perception from central to peripheral vision. , 2017, Journal of vision.

[18]  S. Dwivedi,et al.  Obesity May Be Bad: Compressed Convolutional Networks for Biomedical Image Segmentation , 2020 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[22]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Keneth Sedilla,et al.  Driver Distraction: Determining the Ideal Location of a Navigation Device for Transportation Network Vehicle Services (TNVS) Drivers in Metro Manila , 2018 .

[25]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[26]  Nicolas Pugeault,et al.  How Much of Driving Is Preattentive? , 2015, IEEE Transactions on Vehicular Technology.

[27]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[28]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[29]  Mohan M. Trivedi,et al.  Are all objects equal? Deep spatio-temporal importance prediction in driving videos , 2017, Pattern Recognit..

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[33]  Henry Stark,et al.  Probability, Statistics, and Random Processes for Engineers , 2011 .

[34]  Alex Fridman,et al.  Driver Gaze Region Estimation without Use of Eye Movement , 2015, IEEE Intelligent Systems.

[35]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[38]  Firas Lethaus,et al.  Using Pattern Recognition to Predict Driver Intent , 2011, ICANNGA.

[39]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[40]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.