Learning where to attend like a human driver

Despite the advent of autonomous cars, it's likely — at least in the near future — that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task. In this paper we study the dynamics of the driver's gaze and use it as a proxy to understand related attentional mechanisms. First, we build our analysis upon two questions: where and what the driver is looking at? Second, we model the driver's gaze by training a coarse-to-fine convolutional network on short sequences extracted from the DR(eye)VE dataset. Experimental comparison against different baselines reveal that the driver's gaze can indeed be learnt to some extent, despite i) being highly subjective and ii) having only one driver's gaze available for each sequence due to the irreproducibility of the scene. Eventually, we advocate for a new assisted driving paradigm which suggests to the driver, with no intervention, where she should focus her attention.

[1]  Jean-Philippe Tarel,et al.  Alerting the drivers about road signs with poor visual saliency , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[2]  Nicolas Pugeault,et al.  How Much of Driving Is Preattentive? , 2015, IEEE Transactions on Vehicular Technology.

[3]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrea Palazzi,et al.  DR(eye)VE: A Dataset for Attention-Based Tasks with Applications to Autonomous and Assisted Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[8]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[9]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[10]  Susan Shelly,et al.  The encyclopedia of blindness and vision impairment , 1991 .

[11]  Alain Muzet,et al.  Influence of age, speed and duration of monotonous driving task in traffic on the driver’s useful visual field , 2004, Vision Research.

[12]  Mohan M. Trivedi,et al.  Where is the driver looking: Analysis of head, eye and iris for robust gaze zone estimation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[13]  R. Bowden,et al.  How much of driving is pre-attentive ? , 2015 .

[14]  Markus Enzweiler,et al.  Will this car change the lane? - Turn signal recognition in the frequency domain , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[15]  Fatih Murat Porikli,et al.  Saliency-aware geodesic video object segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  T. Govardhan,et al.  Driver Gaze Tracking And Eyes Off The Road Detection System , 2017 .

[17]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[19]  Alex Fridman,et al.  Driver Gaze Estimation Without Using Eye Movement , 2015, ArXiv.

[20]  Andrea Palazzi,et al.  A Statistical Test for Joint Distributions Equivalence , 2016, ArXiv.

[21]  Seong G. Kong,et al.  Visual Analysis of Eye State and Head Pose for Driver Alertness Monitoring , 2013, IEEE Transactions on Intelligent Transportation Systems.

[22]  Mohan M. Trivedi,et al.  Robust and continuous estimation of driver gaze zone by dynamic analysis of multiple face videos , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[23]  Tobias Bär,et al.  Driver head pose and gaze estimation based on multi-template ICP 3-D point cloud alignment , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[24]  Yan Liu,et al.  Video Saliency Detection via Dynamic Consistent Spatio-Temporal Attention Modelling , 2013, AAAI.

[25]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Anup Doshi,et al.  Lane change intent prediction for driver assistance: On-road design and evaluation , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[27]  Mathias Perrollaz,et al.  Learning-based approach for online lane change intention prediction , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[28]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Cristian Sminchisescu,et al.  Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Firas Lethaus,et al.  Using Pattern Recognition to Predict Driver Intent , 2011, ICANNGA.

[31]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Bryan Reimer,et al.  Driver Gaze Region Estimation Without Using Eye Movement , 2015 .

[34]  Ling Shao,et al.  Consistent Video Saliency Using Local Gradient Flow Optimization and Global Refinement , 2015, IEEE Transactions on Image Processing.

[35]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Nicu Sebe,et al.  Combining Head Pose and Eye Location Information for Gaze Estimation , 2012, IEEE Transactions on Image Processing.

[37]  Thomas Mauthner,et al.  Encoding based saliency detection for videos and images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).