Real-Time Driver State Monitoring Using a CNN Based Spatio-Temporal Approach*

Many road accidents occur due to distracted drivers. Today, driver monitoring is essential even for the latest autonomous vehicles to alert distracted drivers in order to take over control of the vehicle in case of emergency. In this paper, a spatio-temporal approach is applied to classify drivers’ distraction level and movement decisions using convolutional neural networks (CNNs). We approach this problem as action recognition to benefit from temporal information in addition to spatial information. Our approach relies on features extracted from sparsely selected frames of an action using a pre-trained BN-Inception network. Experiments show that our approach outperforms the state-of-the art results on the Distracted Driver Dataset (96.31%), with an accuracy of 99.10% for 10-class classification while providing real-time performance. We also analyzed the impact of fusion using RGB and optical flow modalities with a very recent data level fusion strategy. The results on the Distracted Driver and Brain4Cars datasets show that fusion of these modalities further increases the accuracy.

[1]  Gerhard Rigoll,et al.  Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Luke Fletcher,et al.  Driver Inattention Detection based on Eye Gaze—Road Event Correlation , 2009, Int. J. Robotics Res..

[3]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[4]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[5]  Hesham M. Eraqi,et al.  Real-time Distracted Driver Posture Classification , 2017, ArXiv.

[6]  Jeremy S. Smith,et al.  Driver behavior recognition based on deep convolutional neural networks , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[11]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Sanjay N. Talbar,et al.  Detection of Distracted Driver Using Convolutional Neural Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  T. Govardhan,et al.  Driver Gaze Tracking And Eyes Off The Road Detection System , 2017 .

[14]  Peter Corcoran,et al.  Transfer Learning of Temporal Information for Driver Action Classification , 2017, Midwest Artificial Intelligence and Cognitive Science Conference.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Hema Swetha Koppula,et al.  Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture , 2016, ArXiv.

[19]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Hiroshi Ishiguro,et al.  DEVELOPMENT OF FACIAL-DIRECTION DETECTION SENSOR , 2006 .

[22]  Andreas Savakis,et al.  Distracted Driver Detection: Deep Learning vs Handcrafted Features , 2017 .

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.