Spatial-Temporal Fusion Convolutional Neural Network for Simulated Driving Behavior Recognition

Abnormal driving behaviour is one of the leading cause of terrible traffic accidents endangering human life. Therefore, study on driving behaviour surveillance has become essential to traffic security and public management. In this paper, we conduct this promising research and employ a two stream CNN framework for video-based driving behaviour recognition, in which spatial stream CNN captures appearance information from still frames, whilst temporal stream CNN captures motion information with pre-computed optical flow displacement between a few adjacent video frames. We investigate different spatial-temporal fusion strategies to combine the intra frame static clues and inter frame dynamic clues for final behaviour recognition. So as to validate the effectiveness of the designed spatial-temporal deep learning based model, we create a simulated driving behaviour dataset, containing 1237 videos with 6 different driving behavior for recognition. Experiment result shows that our proposed method obtains noticeable performance improvements compared to the existing methods.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Bailing Zhang,et al.  Recognition of driving postures by contourlet transform and random forests , 2012 .

[3]  Sun Ling,et al.  Early Warning of Traffic Accident in Shanghai Based on Large Data Set Mining , 2016, 2016 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[4]  Sam Kwong,et al.  G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition , 2017, Neurocomputing.

[5]  Lin Sun,et al.  Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Qi Tian,et al.  Pooling the Convolutional Layers in Deep ConvNets for Video Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[8]  Changrui Ren,et al.  Crash prediction with behavioral and physiological features for advanced vehicle collision avoidance system , 2017 .

[9]  Bailing Zhang,et al.  Classification of Driving Postures by Support Vector Machines , 2011, 2011 Sixth International Conference on Image and Graphics.

[10]  Arief Koesdwiady,et al.  End-to-End Deep Learning for Driver Distraction Recognition , 2017, ICIAR.

[11]  Yongsheng Gao,et al.  Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier , 2012, Eng. Appl. Artif. Intell..

[12]  Frans Coenen,et al.  Driving posture recognition by convolutional neural networks , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[13]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[14]  Marios Savvides,et al.  Multiple Scale Faster-RCNN Approach to Driver’s Cell-Phone Usage and Hands on Steering Wheel Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Bailing Zhang,et al.  Erratum to: Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers , 2012, Neural Computing and Applications.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.