Driver Glance Classification In-the-wild: Towards Generalization Across Domains and Subjects

Distracted drivers are dangerous drivers. Equipping advanced driver assistance systems (ADAS) with the ability to detect driver distraction can help prevent accidents and improve driver safety. In order to detect driver distraction, an ADAS must be able to monitor their visual attention. We propose a model that takes as input a patch of the driver's face along with a crop of the eye-region and classifies their glance into 6 coarse regions-of-interest (ROIs) in the vehicle. We demonstrate that an hourglass network, trained with an additional reconstruction loss, allows the model to learn stronger contextual feature representations than a traditional encoder-only classification module. To make the system robust to subject-specific variations in appearance and behavior, we design a personalized hourglass model tuned with an auxiliary input representing the driver's baseline glance behavior. Finally, we present a weakly supervised multi-domain training regimen that enables the hourglass to jointly learn representations from different domains (varying in camera type, angle), utilizing unlabeled samples and thereby reducing annotation cost.

[1]  Carlos Busso,et al.  Probabilistic Estimation of the Gaze Region of the Driver using Dense Classification , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Gang Liu,et al.  Improving Few-Shot User-Specific Gaze Adaptation via Gaze Redirection Synthesis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[7]  Patrick J. Flynn,et al.  Supplementary Text: On Hallucinating Context and Background Pixels from a Face Mask using Multi-scale GANs , 2020 .

[8]  Wojciech Matusik,et al.  Eye Tracking for Everyone , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mario Fritz,et al.  MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  P. Jonathon Phillips,et al.  A Cross Benchmark Assessment of a Deep Convolutional Neural Network for Face Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[11]  Kate Saenko,et al.  Domain Agnostic Learning with Disentangled Representations , 2019, ICML.

[12]  Timnit Gebru,et al.  Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Marios Savvides,et al.  Ring Loss: Convex Feature Normalization for Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Guillermo Sapiro,et al.  SalGaze: Personalizing Gaze Estimation using Visual Saliency , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[16]  Joohwan Kim,et al.  NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation , 2019, CHI.

[17]  Ajjen Joshi,et al.  In-the-wild Drowsiness Detection from Facial Expressions , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[18]  Michael S. Brown,et al.  Classification-Driven Dynamic Image Enhancement , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Gregory Shakhnarovich,et al.  Task-Driven Super Resolution: Object Detection in Low-resolution Images , 2018, ICONIP.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tal Hassner,et al.  Do We Really Need to Collect Millions of Faces for Effective Face Recognition? , 2016, ECCV.

[24]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[25]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[26]  Nuno Vasconcelos,et al.  NetTailor: Tuning the Architecture, Not Just the Weights , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[28]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  F CoughlinJoseph,et al.  Monitoring, Managing, and Motivating Driver Safety and Well-Being , 2011 .

[30]  David L. Smith,et al.  Methodology for Capturing Driver Eye Glance Behavior During In-Vehicle Secondary Tasks , 2005 .

[31]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[33]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[34]  Gregory Shakhnarovich,et al.  Regularizing Deep Networks by Modeling and Predicting Label Structure , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Alex Fridman,et al.  Cognitive Load Estimation in the Wild , 2018, CHI.

[37]  Margrit Betke,et al.  Personalizing Gesture Recognition Using Hierarchical Bayesian Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Holger Voos,et al.  When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking , 2020, Sensors.

[39]  David G. Kidd,et al.  Multi-modal assessment of on-road demand of voice and manual phone calling and voice navigation entry across two embedded vehicle systems , 2015, Ergonomics.

[40]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Georgios Tzimiropoulos,et al.  Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[43]  Nicu Sebe,et al.  Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset , 2020, ArXiv.

[44]  Luc Van Gool,et al.  Gesture Recognition Portfolios for Personalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[46]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Carlo Alberto Avizzano,et al.  Robust and Subject-Independent Driving Manoeuvre Anticipation through Domain-Adversarial Recurrent Neural Networks , 2019, Robotics Auton. Syst..

[48]  Seema Verma,et al.  A survey on driver behavior detection techniques for intelligent transportation systems , 2017, 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence.

[49]  Bryan Reimer,et al.  MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction With Automation , 2017, IEEE Access.

[50]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[51]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[52]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[53]  Bryan Reimer,et al.  Glass half-full: On-road glance metrics differentiate crashes from near-crashes in the 100-Car data. , 2017, Accident; analysis and prevention.

[54]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[55]  Fernando De la Torre,et al.  Selective Transfer Machine for Personalized Facial Action Unit Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[57]  Peter Robinson,et al.  Rendering of Eyes for Eye-Shape Registration and Gaze Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[59]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Yi Yang,et al.  Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  John D. Lee,et al.  How Dangerous Is Looking Away From the Road? Algorithms Predict Crash Risk From Glance Patterns in Naturalistic Driving , 2012, Hum. Factors.

[62]  Alex Fridman,et al.  'Owl' and 'Lizard': patterns of head pose and eye pose in driver gaze classification , 2015, IET Comput. Vis..

[63]  Erik Lindén,et al.  Learning to Personalize in Appearance-Based Gaze Tracking , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Mohan M. Trivedi,et al.  Driver Gaze Estimation in the Real World: Overcoming the Eyeglass Challenge , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[66]  Dot Hs,et al.  The Impact of Hand-Held And Hands-Free Cell Phone Use on Driving Performance and Safety-Critical Event Risk , 2013 .

[67]  Bryan Reimer,et al.  Driver-initiated Tesla Autopilot Disengagements in Naturalistic Driving , 2020, AutomotiveUI.

[68]  Vaidehi S. Natu,et al.  Unaware Person Recognition From the Body When Face Identification Fails , 2013, Psychological science.

[69]  Fernando De la Torre,et al.  Driver Gaze Tracking and Eyes Off the Road Detection System , 2015, IEEE Transactions on Intelligent Transportation Systems.

[70]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[72]  Bryan Reimer,et al.  Monitoring, managing, and motivating driver safety and well-being , 2011, IEEE Pervasive Computing.

[73]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[74]  Patrick J. Flynn,et al.  Fast Face Image Synthesis With Minimal Training , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).