Driver Drowsiness Recognition via 3D Conditional GAN and Two-Level Attention Bi-LSTM

Driver drowsiness has currently been a severe issue threatening road safety, hence it is vital to develop an effective drowsiness recognition algorithm to avoid traffic accidents. However, recognizing drowsiness is still very challenging, due to the large intra-class variations in facial expression, head pose and illumination condition. In this paper, a new deep learning framework based on the hybrid of 3D conditional generative adversarial network and two-level attention bidirectional long short-term memory network (3DcGAN-TLABiLSTM) has been proposed for robust driver drowsiness recognition. Aiming at extracting short-term spatial-temporal features with abundant drowsiness-related information, we design a 3D encoder-decoder generator with the condition of auxiliary information to generate high-quality fake image sequences and devise a 3D discriminator to learn drowsiness-related representation from spatial-temporal domain. In addition, for long-term spatial-temporal fusion, we investigate the use of two-level attention mechanism to guide the bidirectional long short-term memory learn the saliency of short-term memory information and long-term temporal information. For experiment, we evaluate our 3DcGAN-TLABiLSTM framework on a public NTHU-DDD dataset. Experimental results show that the proposed approach achieves higher precision of drowsiness recognition compared to the state-of-the-art.

[1]  Xie Bin,et al.  A PERCLOS-Based Driver Fatigue Recognition Application for Smart Vehicle Space , 2010, 2010 Third International Symposium on Information Processing.

[2]  Shang-Hong Lai,et al.  Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network , 2016, ACCV Workshops.

[3]  Anirban Dasgupta,et al.  A Smartphone-Based Drowsiness Detection and Warning System for Automotive Drivers , 2019, IEEE Transactions on Intelligent Transportation Systems.

[4]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[5]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hui Zhang,et al.  Joint Shape and Local Appearance Features for Real-Time Driver Drowsiness Detection , 2016, ACCV Workshops.

[7]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shahzad Anwar,et al.  Driver Fatigue Detection Systems: A Review , 2019, IEEE Transactions on Intelligent Transportation Systems.

[10]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sang Wook Lee,et al.  Driver Drowsiness Detection Using Condition-Adaptive Representation Learning Framework , 2019, IEEE Transactions on Intelligent Transportation Systems.

[14]  M. Amaç Güvensan,et al.  Driver Behavior Analysis for Safe Driving: A Survey , 2015, IEEE Transactions on Intelligent Transportation Systems.

[15]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[16]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Hikmat Ullah Khan,et al.  A Survey on State-of-the-Art Drowsiness Detection Techniques , 2019, IEEE Access.

[18]  Sangwook Lee,et al.  Representation Learning, Scene Understanding, and Feature Fusion for Drowsiness Detection , 2016, ACCV Workshops.

[19]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Manolya Kavakli,et al.  Sensor Applications and Physiological Features in Drivers’ Drowsiness Detection: A Review , 2018, IEEE Sensors Journal.

[21]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[23]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Xinmin Wang,et al.  EEG-Based Spatio–Temporal Convolutional Neural Network for Driver Fatigue Evaluation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[30]  Seong G. Kong,et al.  Visual Analysis of Eye State and Head Pose for Driver Alertness Monitoring , 2013, IEEE Transactions on Intelligent Transportation Systems.

[31]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[33]  Wei Sun,et al.  A Real-Time Fatigue Driving Recognition Method Incorporating Contextual Features and Two Fusion Levels , 2017, IEEE Transactions on Intelligent Transportation Systems.

[34]  Fei Pan,et al.  Driver Drowsiness Detection System Based on Feature Representation Learning Using Various Deep Networks , 2016, ACCV Workshops.

[35]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Toshihiro Hiraoka,et al.  Heart Rate Variability-Based Driver Drowsiness Detection and Its Validation With EEG , 2019, IEEE Transactions on Biomedical Engineering.

[37]  Gang Hua,et al.  CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Changhui Hu,et al.  IL-GAN: Illumination-invariant representation learning for single sample face recognition , 2019, J. Vis. Commun. Image Represent..

[39]  Bruno Sinopoli,et al.  Kalman filtering with intermittent observations , 2004, IEEE Transactions on Automatic Control.

[40]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[43]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[44]  Yuxin Peng,et al.  Two-Stream Collaborative Learning With Spatial-Temporal Attention for Video Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[46]  Yunbo Wang,et al.  Eidetic 3D LSTM: A Model for Video Prediction and Beyond , 2019, ICLR.

[47]  Jacob Scharcanski,et al.  Yawning Detection Using Embedded Smart Cameras , 2016, IEEE Transactions on Instrumentation and Measurement.

[48]  Shariq Hussain,et al.  An Effective Framework for Driver Fatigue Recognition Based on Intelligent Facial Expressions Analysis , 2018, IEEE Access.

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Chiou-Ting Hsu,et al.  MSTN: Multistage Spatial-Temporal Network for Driver Drowsiness Detection , 2016, ACCV Workshops.

[51]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Mark J. Shensa,et al.  The discrete wavelet transform: wedding the a trous and Mallat algorithms , 1992, IEEE Trans. Signal Process..

[53]  Omid Dehzangi,et al.  Unobtrusive Driver Drowsiness Prediction Using Driving Behavior from Vehicular Sensors , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[54]  Jing-Ming Guo,et al.  Driver drowsiness detection using hybrid convolutional neural network and long short-term memory , 2018, Multimedia Tools and Applications.

[55]  Xu Tang,et al.  Face Aging with Identity-Preserved Conditional Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Yu Qiao,et al.  Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos , 2018, IEEE Transactions on Image Processing.

[57]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[58]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Xuan-Phung Huynh,et al.  Detection of Driver Drowsiness Using 3D Deep Neural Network and Semi-Supervised Gradient Boosting Machine , 2016, ACCV Workshops.

[60]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Bao-Liang Lu,et al.  Driving fatigue detection with fusion of EEG and forehead EOG , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[62]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).