论文信息 - Driver behavior recognition based on deep convolutional neural networks

Driver behavior recognition based on deep convolutional neural networks

Traffic safety is a severe problem around the world. Many road accidents are normally related with the driver's unsafe driving behavior, e.g. eating while driving. In this work, we propose a vision-based solution to recognize the driver's behavior based on convolutional neural networks. Specifically, given an image, skin-like regions are extracted by Gaussian Mixture Model, which are passed to a deep convolutional neural networks model, namely R*CNN, to generate action labels. The skin-like regions are able to provide abundant semantic information with sufficient discriminative capability. Also, R*CNN is able to select the most informative regions from candidates to facilitate the final action recognition. We tested the proposed methods on Southeast University Driving-posture Dataset and achieve mean Average Precision(mAP) of 97.76% on the dataset which prove the proposed method is effective in drivers's action recognition.

[1] Carme Torras,et al. Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain , 2016, IEEE Robotics and Automation Letters.

[2] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[3] Andrew Zisserman,et al. The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[4] Chengjun Liu,et al. A Bayesian Discriminating Features Method for Face Detection , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Frans Coenen,et al. Driving posture recognition by convolutional neural networks , 2016, IET Comput. Vis..

[6] Dong-Chen He,et al. Texture Unit, Texture Spectrum, And Texture Analysis , 1990 .

[7] Alexander H. Waibel,et al. A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[8] Fei-Fei Li,et al. Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9] Guodong Guo,et al. A survey on still image based human action recognition , 2014, Pattern Recognit..

[10] Bailing Zhang,et al. Recognition of driving postures by contourlet transform and random forests , 2012 .

[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[13] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[16] Narendra Ahuja,et al. Gaussian mixture model for human skin color and its applications in image and video databases , 1998, Electronic Imaging.

[18] Jitendra Malik,et al. Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Ivan Laptev,et al. Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[20] Jitendra Malik,et al. R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.

[21] Andrew Zisserman,et al. Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[22] Koen E. A. van de Sande,et al. Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[23] James M. Rehg,et al. Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[24] Antonio Torralba,et al. SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26] Yupin Luo,et al. Making full use of spatial-temporal interest points: An AdaBoost approach for action recognition , 2010, 2010 IEEE International Conference on Image Processing.

[27] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[28] Fei-Fei Li,et al. Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Gang Yu,et al. Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[30] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31] Jitendra Malik,et al. Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).