Probabilistic Estimation of the Gaze Region of the Driver using Dense Classification

The ability to monitor the visual attention of a driver is a useful feature for smart vehicles to understand the driver's intents and behaviors. The gaze angle of the driver is not deterministically related to his/her head pose due to the interplay between head and eye movements. Therefore, this study aims to establish a probabilistic relationship using deep learning. While probabilistic regression techniques such as Gaussian process regression (GPR) has been previously used to predict the visual attention of a driver, the proposed deep learning framework is a more generic approach that does not make assumptions, learning the relationship between gaze and head pose from the data. In our formulation, the continuous gaze angles are converted into intervals and the grid of the quantized angles is treated as an image for dense prediction. We rely on convolutional neural networks (CNNs) with upsampling to map the six degrees of freedom of the orientation and position of the head into gaze angles. We train and evaluate the proposed network with data collected from drivers who were asked to look at predetermined locations inside a car during naturalistic driving recordings. The proposed model obtains very promising results, where the size of the gaze region with 95% accuracy is only 11.73% of a half sphere centered at the driver, which approximates his/her field of view. The architecture offers an appealing and general solution to convert regression problems into dense classification problems.

[1]  S. Srihari Mixture Density Networks , 1994 .

[2]  Mohan M. Trivedi,et al.  On generalizing driver gaze zone estimation using convolutional neural networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Jean-Marc Odobez,et al.  Person independent 3D gaze estimation from remote RGB-D cameras , 2013, 2013 IEEE International Conference on Image Processing.

[6]  Jeffrey Humpherys,et al.  Forward Thinking: Building and Training Neural Networks One Layer at a Time , 2017, ArXiv.

[7]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[8]  Mohan M. Trivedi,et al.  Robust and continuous estimation of driver gaze zone by dynamic analysis of multiple face videos , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[9]  Carlos Busso,et al.  Analyzing the relationship between head pose and gaze to model driver visual attention , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[10]  John D Lee,et al.  Combining cognitive and visual distraction: less than the sum of its parts. , 2010, Accident; analysis and prevention.

[11]  Luís Torgo,et al.  Regression by Classification , 1996, SBIA.

[12]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[13]  Emily Mower Provost,et al.  Discretized Continuous Speech Emotion Recognition with Multi-Task Deep Recurrent Neural Network , 2017, INTERSPEECH.

[14]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[15]  Carlos Busso,et al.  Predicting Perceived Visual and Cognitive Distractions of Drivers With Multimodal Features , 2015, IEEE Transactions on Intelligent Transportation Systems.

[16]  Eibe Frank,et al.  A Simple Approach to Ordinal Classification , 2001, ECML.

[17]  Carlos Busso,et al.  Detecting Drivers' Mirror-Checking Actions and Its Application to Maneuver and Secondary Task Recognition , 2016, IEEE Transactions on Intelligent Transportation Systems.

[18]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[20]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[21]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  M.M. Trivedi,et al.  HyHOPE: Hybrid Head Orientation and Position Estimation for vision-based driver head tracking , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[24]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[25]  Kang Ryoung Park,et al.  Real-Time Gaze Estimator Based on Driver's Head Orientation for Forward Collision Warning System , 2011, IEEE Transactions on Intelligent Transportation Systems.

[26]  Carlos Busso,et al.  Probabilistic estimation of the driver's gaze from head orientation and position , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[27]  Mohan M. Trivedi,et al.  On the design and evaluation of robust head pose for visual user interfaces: algorithms, databases, and comparisons , 2012, AutomotiveUI.

[28]  Carlos Busso,et al.  Challenges in head pose estimation of drivers in naturalistic recordings using existing tools , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).