DeepSafeDrive: A grammar-aware driver parsing approach to Driver Behavioral Situational Awareness (DB-SAW)

Abstract This paper presents a Grammar-aware Driver Parsing (GDP) algorithm, with deep features, to provide a novel driver behavior situational awareness system (DB-SAW). A deep model is first trained to extract highly discriminative features of the driver. Then, a grammatical structure on the deep features is defined to be used as prior knowledge for a semi-supervised proposal candidate generation. The Region with Convolutional Neural Networks (R-CNN) method is ultimately utilized to precisely segment parts of the driver. The proposed method not only aims to automatically find parts of the driver in challenging “drivers in the wild” databases, i.e. the standardized Strategic Highway Research Program (SHRP-2) and the challenging Vision for Intelligent Vehicles and Application (VIVA), but is also able to investigate seat belt usage and the position of the driver's hands (on a phone vs on a steering wheel). We conduct experiments on various applications and compare our GDP method against other state-of-the-art detection and segmentation approaches, i.e. SDS [1] , CRF-RNN [2] , DJTL [3] , and R-CNN [4] on SHRP-2 and VIVA databases.

[1]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[2]  Peter V. Gehler,et al.  Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Anders P. Eriksson,et al.  Normalized Cuts Revisited: A Reformulation for Segmentation with Linear Grouping Constraints , 2007, ICCV.

[4]  Mohan M. Trivedi,et al.  On Performance Evaluation of Driver Hand Detection Algorithms: Challenges, Dataset, and Metrics , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[5]  Mohan M. Trivedi,et al.  Head, Eye, and Hand Patterns for Driver Activity Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[6]  Song-Chun Zhu,et al.  Integrating Grammar and Segmentation for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[8]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Hand detection using multiple proposals , 2011, BMVC.

[10]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Marios Savvides,et al.  Driver cell phone usage detection on Strategic Highway Research Program (SHRP2) face view videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Iasonas Kokkinos,et al.  Segmentation-Aware Deformable Part Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Nathan D. Cahill,et al.  Semi-Supervised Normalized Cuts for Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[19]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[20]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[23]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  OlssonCarl,et al.  Normalized Cuts Revisited , 2011 .

[27]  Liang Lin,et al.  Deep Joint Task Learning for Generic Object Extraction , 2014, NIPS.