Driver behavior recognition based on deep convolutional neural networks

Traffic safety is a severe problem around the world. Many road accidents are normally related with the driver's unsafe driving behavior, e.g. eating while driving. In this work, we propose a vision-based solution to recognize the driver's behavior based on convolutional neural networks. Specifically, given an image, skin-like regions are extracted by Gaussian Mixture Model, which are passed to a deep convolutional neural networks model, namely R*CNN, to generate action labels. The skin-like regions are able to provide abundant semantic information with sufficient discriminative capability. Also, R*CNN is able to select the most informative regions from candidates to facilitate the final action recognition. We tested the proposed methods on Southeast University Driving-posture Dataset and achieve mean Average Precision(mAP) of 97.76% on the dataset which prove the proposed method is effective in drivers's action recognition.

[1]  Carme Torras,et al.  Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain , 2016, IEEE Robotics and Automation Letters.

[2]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[3]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[4]  Chengjun Liu,et al.  A Bayesian Discriminating Features Method for Face Detection , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Frans Coenen,et al.  Driving posture recognition by convolutional neural networks , 2016, IET Comput. Vis..

[6]  Dong-Chen He,et al.  Texture Unit, Texture Spectrum, And Texture Analysis , 1990 .

[7]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[8]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Guodong Guo,et al.  A survey on still image based human action recognition , 2014, Pattern Recognit..

[10]  Bailing Zhang,et al.  Recognition of driving postures by contourlet transform and random forests , 2012 .

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[16]  Narendra Ahuja,et al.  Gaussian mixture model for human skin color and its applications in image and video databases , 1998, Electronic Imaging.

[18]  Jitendra Malik,et al.  Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[20]  Jitendra Malik,et al.  R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.

[21]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[22]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[23]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[24]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Yupin Luo,et al.  Making full use of spatial-temporal interest points: An AdaBoost approach for action recognition , 2010, 2010 IEEE International Conference on Image Processing.

[27]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[28]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Gang Yu,et al.  Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[30]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31]  Jitendra Malik,et al.  Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).