Continuous Driver Activity Recognition from Short Isolated Action Sequences

Advanced driver monitoring systems significantly increase safety by detecting driver drowsiness or distraction. Knowing the driver’s current state or actions allows for adaptive warning strategies or prediction of the driver’s response time to take back the control of a semi-autonomous vehicle. We present an online driver monitoring system for detecting characteristic actions and states inside a car interior by analysing the full driver seat region. With the proposed training method, a recurrent neural network for online sequence analysis is capable of learning from isolated action sequences only. The proposed method allows training of a recurrent neural network from snippets of actions, while this network can be applied to continuous video streams at runtime. With a mean average precision of 0.77, we reach better classification results on our test data than commonly used methods.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[3]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Yoshua Bengio,et al.  SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.

[6]  H. Jaeger A tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the "echo state network" approach , 2021 .

[7]  Steven Lake Waslander,et al.  State initialization for recurrent neural network modeling of time-series data , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[8]  Frans Coenen,et al.  Driving posture recognition by convolutional neural networks , 2015, 2015 11th International Conference on Natural Computation (ICNC).

[9]  Gang Wang,et al.  NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[12]  Mohan M. Trivedi,et al.  Head, Eye, and Hand Patterns for Driver Activity Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[13]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Anton Kummert,et al.  Action and Object Interaction Recognition for Driver Activity Classification , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[15]  Anton Kummert,et al.  Driver State Monitoring with Hierarchical Classification , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  C. A. Pickering The search for a safer driver interface: a review of gesture recognition human machine interface , 2005 .

[18]  Keiichi Uchimura,et al.  Driver Inattention Monitoring System for Intelligent Vehicles: A Review , 2009, IEEE Transactions on Intelligent Transportation Systems.

[19]  Thomas Brox,et al.  ECO: Efficient Convolutional Network for Online Video Understanding , 2018, ECCV.

[20]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chuang Gan,et al.  TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[24]  Limin Wang,et al.  Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.