AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception

Driver distraction has become a significant cause of severe traffic accidents over the past decade. Despite the growing development of vision-driven driver monitoring systems, the lack of comprehensive perception datasets restricts road safety and traffic security. In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. AIDE facilitates holistic driver monitoring through three distinctive characteristics, including multi-view settings of driver and scene, multi-modal annotations of face, body, posture, and gesture, and four pragmatic task designs for driving understanding. To thoroughly explore AIDE, we provide experimental benchmarks on three kinds of baseline frameworks via extensive methods. Moreover, two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations. We also systematically investigate the importance and rationality of the key components in AIDE and benchmarks. The project link is https://github.com/ydk122024/AIDE.

[1]  Bo Li,et al.  Query-Efficient Decision-Based Black-Box Patch Attack , 2023, IEEE Transactions on Information Forensics and Security.

[2]  Dingkang Yang,et al.  Human 3D Avatar Modeling with Implicit Neural Representation: A Brief Survey , 2022, 2022 14th International Conference on Signal Processing Systems (ICSPS).

[3]  Bo Li,et al.  Content-based Unrestricted Adversarial Attack , 2023, NeurIPS.

[4]  Lizhe Qi,et al.  Model Robustness Meets Data Privacy: Adversarial Robustness Distillation without Original Data , 2023, ArXiv.

[5]  Zhiyan Dong,et al.  Context De-Confounded Emotion Recognition , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bo Li,et al.  Efficient Decision-based Black-box Patch Attacks on Video Recognition , 2023, ArXiv.

[7]  Bolei Zhou,et al.  V2V4Real: A Real-World Large-Scale Dataset for Vehicle-to-Vehicle Cooperative Perception , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yan Liu,et al.  A novel efficient Multi-view traffic-related object detection framework , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Xiaoying Wang,et al.  Towards Simultaneous Segmentation of Liver Tumors and Intrahepatic Vessels via Cross-attention Mechanism , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yang Liu,et al.  Adversarial Contrastive Distillation with Adaptive Denoising , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Y. Liu,et al.  Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models , 2023, ACM Comput. Surv..

[12]  Xiaoming Zhao,et al.  Target and source modality co-reinforcement for emotion understanding from asynchronous multimodal sequences , 2023, Knowl. Based Syst..

[13]  Dingkang Yang,et al.  Boosting the Transferability of Adversarial Attacks with Global Momentum Initialization , 2022, ArXiv.

[14]  Cewu Lu,et al.  AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  John K. Tsotsos,et al.  Attention for Vision-Based Assistive and Automated Driving: A Review of Algorithms and Datasets , 2022, IEEE transactions on intelligent transportation systems (Print).

[16]  Bhushan S. Atote,et al.  Real-Time Driver Monitoring Systems through Modality and View Analysis , 2022, ArXiv.

[17]  Jiaqi Ma,et al.  Bridging the Domain Gap for Multi-Agent Perception , 2022, 2210.08451.

[18]  Dingkang Yang,et al.  Disentangled Representation Learning for Multimodal Emotion Recognition , 2022, ACM Multimedia.

[19]  Dingkang Yang,et al.  Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences , 2022, ACM Multimedia.

[20]  Y. Liu,et al.  Learning Appearance-Motion Normality for Video Anomaly Detection , 2022, 2022 IEEE International Conference on Multimedia and Expo (ICME).

[21]  Chixiao Chen,et al.  CA-SpaceNet: Counterfactual Analysis for 6D Pose Estimation in Space , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  RaKyeom Kim,et al.  Multimodal Data Collection System for Driver Emotion Recognition Based on Self-Reporting in Real-World Driving , 2022, Sensors.

[23]  Yan Xu,et al.  CogEmoNet: A Cognitive-Feature-Augmented Driver Emotion Recognition Model for Smart Cockpit , 2022, IEEE Transactions on Computational Social Systems.

[24]  Yang Liu,et al.  Collaborative Normality Learning Framework for Weakly Supervised Video Anomaly Detection , 2022, IEEE Transactions on Circuits and Systems II: Express Briefs.

[25]  Lantao Liu,et al.  Model-Agnostic Multi-Agent Perception Framework , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Bo Li,et al.  Towards Practical Certifiable Patch Defense with Vision Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wenqiang Zhang,et al.  Efficient Universal Shuffle Attack for Visual Object Tracking , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Tao Suo,et al.  A Survey of Video-based Action Quality Assessment , 2021, 2021 International Conference on Networking Systems of AI (INSAI).

[29]  Chixiao Chen,et al.  TSA-Net: Tube Self-Attention Network for Action Quality Assessment , 2021, ACM Multimedia.

[30]  Fakhri Karray,et al.  A survey on vision-based driver distraction analysis , 2021, J. Syst. Archit..

[31]  L. Tougne,et al.  DriPE: A Dataset for Human Pose Estimation in Real-World Driving Settings , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[32]  Zhang Lihua,et al.  Learning Associative Representation for Facial Expression Recognition , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[33]  Nikolas Martelaro,et al.  Learning When Agents Can Talk to Drivers Using the INAGT Dataset and Multisensor Fusion , 2021, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[34]  Guofa Li,et al.  A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles: Emotions Triggered by Video-Audio Clips in Driving Scenarios , 2021, IEEE Transactions on Affective Computing.

[35]  Heng Wang,et al.  Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.

[36]  Hazem M. Abbas,et al.  End-To-End Driver Distraction Recognition Using Novel Low Lighting Support Dataset , 2020, 2020 15th International Conference on Computer Engineering and Systems (ICCES).

[37]  Fakhri Karray,et al.  Driver Inattention Detection in the Context of Next-Generation Autonomous Vehicles Design: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[38]  Imen Jegham,et al.  A novel public dataset for multimodal multiview and multispectral driver distraction analysis: 3MDAD , 2020, Signal Process. Image Commun..

[39]  Hang Xu,et al.  Driver Anomaly Detection: A Dataset and Contrastive Learning Approach , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Luis Salgado,et al.  DMD: A Large-Scale Multi-Modal Driver Monitoring Dataset for Attention and Alertness Analysis , 2020, ECCV Workshops.

[41]  Shahid Mumtaz,et al.  A Convolution Bidirectional Long Short-Term Memory Neural Network for Driver Emotion Recognition , 2020, IEEE Transactions on Intelligent Transportation Systems.

[42]  Xingda Qu,et al.  Influence of traffic congestion on driver behavior in post-congestion driving. , 2020, Accident; analysis and prevention.

[43]  G. Rigoll,et al.  DriverMHG: A Multi-Modal Dataset for Dynamic Recognition of Driver Micro Hand Gestures and a Real-Time Recognition Framework , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[44]  Jianru Xue,et al.  DADA: A Large-scale Benchmark and Model for Driver Attention Prediction in Accidental Scenarios , 2019, arXiv.org.

[45]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[46]  Rainer Stiefelhagen,et al.  Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Imen Jegham,et al.  MDAD: A Multimodal and Multiview in-Vehicle Driver Action Dataset , 2019, CAIP.

[48]  Anthony D. McDonald,et al.  Classification of Driver Distraction: A Comprehensive Analysis of Feature Generation, Machine Learning, and Input Measures , 2019, Hum. Factors.

[49]  Shahzad Anwar,et al.  Driver Fatigue Detection Systems: A Review , 2019, IEEE Transactions on Intelligent Transportation Systems.

[50]  Dariu M. Gavrila,et al.  DD-Pose - A large-scale Driver Head Pose Benchmark , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[51]  Hesham M. Eraqi,et al.  Driver Distraction Identification with an Ensemble of Convolutional Neural Networks , 2019, Journal of Advanced Transportation.

[52]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  ByoungChul Ko,et al.  Driver’s Facial Expression Recognition in Real-Time for Safe Driving , 2018, Sensors.

[54]  Rui Li,et al.  Driver Behavior Recognition via Interwoven Deep Convolutional Neural Nets With Multi-Stream Inputs , 2018, IEEE Access.

[55]  Girish Chowdhary,et al.  Real‐time detection of distracted driving based on deep learning , 2018, IET Intelligent Transport Systems.

[56]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[57]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Qing Liu,et al.  Driver drowsiness detection using facial dynamic fusion information and a DBN , 2018 .

[60]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[61]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Yutaka Satoh,et al.  Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[64]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Rainer Stiefelhagen,et al.  DriveAHead — A Large-Scale Driver Head Pose Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[67]  Ridha Soua,et al.  Recent Trends in Driver Safety Monitoring Systems: State of the Art and Challenges , 2017, IEEE Transactions on Vehicular Technology.

[68]  Michael Anthony Bauer,et al.  Detection and recognition of traffic signs inside the attentional visual field of drivers , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[69]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[71]  Dinesh Sethi,et al.  76 European facts and the global status report on road safety 2015 , 2016 .

[72]  Antonio M. López,et al.  A reduced feature set for driver head pose estimation , 2016, Appl. Soft Comput..

[73]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.

[74]  Dariu Gavrila,et al.  Driver and pedestrian awareness-based collision risk analysis , 2016, 2016 IEEE Intelligent Vehicles Symposium (IV).

[75]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[76]  Hema Swetha Koppula,et al.  Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture , 2016, ArXiv.

[77]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Mohan M. Trivedi,et al.  On Performance Evaluation of Driver Hand Detection Algorithms: Challenges, Dataset, and Metrics , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[79]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[80]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[82]  Myounghoon Jeon,et al.  Effects of specific emotions on subjective judgment, driving performance, and perceived workload , 2014 .

[83]  Mohan M. Trivedi,et al.  Driver hand activity analysis in naturalistic driving studies: challenges, algorithms, and experimental studies , 2013, J. Electronic Imaging.

[84]  Mohan M. Trivedi,et al.  In-vehicle hand activity recognition using integration of regions , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[85]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[86]  Bailing Zhang,et al.  Recognition of driving postures by contourlet transform and random forests , 2012 .

[87]  Hua Cai,et al.  Modeling of operators' emotion and task performance in a virtual driving environment , 2011, Int. J. Hum. Comput. Stud..

[88]  Bin Yang,et al.  Camera-based drowsiness reference for driver state classification under real driving conditions , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[89]  Thomas A. Dingus,et al.  The Impact of Driver Inattention on Near-Crash/Crash Risk: An Analysis Using the 100-Car Naturalistic Driving Study Data , 2006 .

[90]  A. Downs Why Traffic Congestion is Here to Stay....and Will Get Worse , 2004 .

[91]  M. Sivak The Information That Drivers Use: Is it Indeed 90% Visual? , 1996, Perception.

[92]  Randolph R. Cornelius,et al.  The science of emotion: Research and tradition in the psychology of emotion. , 1997 .

[93]  J. Russell A circumplex model of affect. , 1980 .

[94]  Jiafu Wang,et al.  Direct field-to-pattern monolithic design of holographic metasurface via residual encoder-decoder convolutional neural network , 2023, Opto-Electronic Advances.

[95]  Simon Stent,et al.  Look Both Ways: Self-Supervising Driver Gaze Estimation and Road Scene Saliency , 2022 .

[96]  Xiaoming Zhao,et al.  Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition , 2022, IEEE Signal Processing Letters.

[97]  Yang Liu,et al.  Emotion Recognition for Multiple Context Awareness , 2022, ECCV.

[98]  Bo Li,et al.  Shape Matters: Deformable Patch Attack , 2022, ECCV.

[99]  Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles , 2022 .