Driver Behavior Recognition via Interwoven Deep Convolutional Neural Nets With Multi-Stream Inputs

Understanding driver activity is vital for in-vehicle systems that aim to reduce the incidence of car accidents rooted in cognitive distraction. Automating real-time behavior recognition while ensuring actions classification with high accuracy is however challenging, given the multitude of circumstances surrounding drivers, the unique traits of individuals, and the computational constraints imposed by in-vehicle embedded platforms. Prior work fails to jointly meet these runtime/accuracy requirements and mostly rely on a single sensing modality, which in turn can be a single point of failure. In this paper, we harness the exceptional feature extraction abilities of deep learning and propose a dedicated Interwoven Deep Convolutional Neural Network (InterCNN) architecture to tackle the problem of accurate classification of driver behaviors in real-time. The proposed solution exploits information from multi-stream inputs, i.e., in-vehicle cameras with different fields of view and optical flows computed based on recorded images, and merges through multiple fusion layers abstract features that it extracts. This builds a tight ensembling system, which significantly improves the robustness of the model. In addition, we introduce a temporal voting scheme based on historical inference instances, in order to enhance the classification accuracy. Experiments conducted with a dataset that we collect in a mock-up car environment demonstrate that the proposed InterCNN with MobileNet convolutional blocks can classify 9 different behaviors with 73.97% accuracy, and 5 ‘aggregated’ behaviors with 81.66% accuracy. We further show that our architecture is highly computationally efficient, as it performs inferences within 15 ms, which satisfies the real-time constraints of intelligent cars. Nevertheless, our InterCNN is robust to lossy input, as the classification remains accurate when two input streams are occluded.

[1]  Mohan M. Trivedi,et al.  Looking at faces in a vehicle: A deep CNN based approach and evaluation , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[2]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  M. Amaç Güvensan,et al.  Driver Behavior Analysis for Safe Driving: A Survey , 2015, IEEE Transactions on Intelligent Transportation Systems.

[5]  Paul Patras,et al.  Long-Term Mobile Traffic Forecasting Using Deep Spatio-Temporal Neural Networks , 2017, MobiHoc.

[6]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[7]  Michael J. Flynn,et al.  Design Issues in Division and Other Floating-Point Operations , 1997, IEEE Trans. Computers.

[8]  Munif Alotaibi,et al.  Distracted driver classification using deep learning , 2020, Signal Image Video Process..

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[11]  Kang Ryoung Park,et al.  Deep Learning-Based Gaze Detection System for Automobile Drivers Using a NIR Camera Sensor , 2018, Sensors.

[12]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Ying Wang,et al.  Detection of Driver Cognitive Distraction: A Comparison Study of Stop-Controlled Intersection and Speed-Limited Highway , 2016, IEEE Transactions on Intelligent Transportation Systems.

[14]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Fakhri Karray,et al.  A Visual-Based Driver Distraction Recognition and Detection Using Random Forest , 2014, ICIAR.

[16]  Gustav Markkula,et al.  Driver Distraction Detection with a Camera Vision System , 2007, 2007 IEEE International Conference on Image Processing.

[17]  Jeffrey J. Rodriguez,et al.  Drive-Net: Convolutional Network for Driver Distraction Detection , 2018, 2018 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[18]  Xuelong Li,et al.  A 3D-CNN and LSTM Based Multi-Task Learning Architecture for Action Recognition , 2019, IEEE Access.

[19]  Konrad Doll,et al.  Cooperative Starting Movement Detection of Cyclists Using Convolutional Neural Networks and a Boosted Stacking Ensemble , 2018, IEEE Transactions on Intelligent Vehicles.

[20]  Ioannis Pavlidis,et al.  A multimodal dataset for various forms of distracted driving , 2017, Scientific Data.

[21]  Dongpu Cao,et al.  Driver Activity Recognition for Intelligent Vehicles: A Deep Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[22]  Paolo Napoletano,et al.  Recognition of driver distractions using deep learning , 2018, 2018 IEEE 8th International Conference on Consumer Electronics - Berlin (ICCE-Berlin).

[23]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[24]  Ankita. S. Kulkarni,et al.  A review paper on monitoring driver distraction in real time using computer vision system , 2017, 2017 IEEE International Conference on Electrical, Instrumentation and Communication Engineering (ICEICE).

[25]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[26]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[27]  Marios Savvides,et al.  Multiple Scale Faster-RCNN Approach to Driver’s Cell-Phone Usage and Hands on Steering Wheel Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles , 2022 .

[29]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Roger Zimmermann,et al.  Are You Paying Attention? Detecting Distracted Driving in Real-Time , 2019, 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM).

[31]  Yike Guo,et al.  TensorLayer: A Versatile Library for Efficient Deep Learning Development , 2017, ACM Multimedia.

[32]  Alex Zelinsky,et al.  Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[33]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[34]  Mudhakar Srivatsa,et al.  Darnet: a deep learning solution for distracted driving detection , 2017, Middleware '17.

[35]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Saeid Nahavandi,et al.  Intent Prediction of Pedestrians via Motion Trajectories Using Stacked Recurrent Neural Networks , 2018, IEEE Transactions on Intelligent Vehicles.

[37]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Eric Rodgman,et al.  The role of driver distraction in traffic crashes , 2001 .

[40]  Girish Chowdhary,et al.  Real‐time detection of distracted driving based on deep learning , 2018, IET Intelligent Transport Systems.

[41]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[42]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[43]  Rubén Usamentiaga,et al.  Driver Distraction Using Visual-Based Sensors and Algorithms , 2016, Sensors.

[44]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[45]  Yan Yang,et al.  Driver Distraction Detection Using Semi-Supervised Machine Learning , 2016, IEEE Transactions on Intelligent Transportation Systems.

[46]  Hesham M. Eraqi,et al.  Real-time Distracted Driver Posture Classification , 2017, ArXiv.

[47]  Gerhard Rigoll,et al.  Real-Time Driver State Monitoring Using a CNN Based Spatio-Temporal Approach* , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[48]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..