The Multimodal Driver Monitoring Database: A Naturalistic Corpus to Study Driver Attention

A smart vehicle should be able to monitor the actions and behaviors of the human driver to provide critical warnings or intervene when necessary. Recent advancements in deep learning and computer vision have shown great promise in monitoring human behaviors and activities. While these algorithms work well in a controlled environment, naturalistic driving conditions add new challenges such as illumination variations, occlusions and extreme head poses. A vast amount of in-domain data is required to train models that provide high performance in predicting driving related tasks to effectively monitor driver actions and behaviors. Toward building the required infrastructure, this paper presents the multimodal driver monitoring (MDM) dataset, which was collected with 59 subjects that were recorded performing various tasks. We use the Fi- Cap device that continuously tracks the head movement of the driver using fiducial markers, providing frame-based annotations to train head pose algorithms in naturalistic driving conditions. We ask the driver to look at predetermined gaze locations to obtain accurate correlation between the driver's facial image and visual attention. We also collect data when the driver performs common secondary activities such as navigation using a smart phone and operating the in-car infotainment system. All of the driver's activities are recorded with high definition RGB cameras and time-of-flight depth camera. We also record the controller area network-bus (CAN-Bus), extracting important information. These high quality recordings serve as the ideal resource to train various efficient algorithms for monitoring the driver, providing further advancements in the field of in-vehicle safety systems.

[1]  Thao Dang,et al.  Maneuver recognition using probabilistic finite-state machines and fuzzy logic , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[2]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[4]  Carlos Busso,et al.  Probabilistic Estimation of the Gaze Region of the Driver using Dense Classification , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[5]  A. Sathyanarayana,et al.  UTDrive: Driver Behavior and Speech Interactive Systems for In-Vehicle Environments , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[6]  Mohan M. Trivedi,et al.  Continuous Head Movement Estimator for Driver Assistance: Issues, Algorithms, and On-Road Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[7]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[8]  Carlos Busso,et al.  Estimation of Driver's Gaze Region from Head Position and Orientation using Probabilistic Confidence Regions , 2020, ArXiv.

[9]  Carlos Busso,et al.  FI-CAP: Robust Framework to Benchmark Head Pose Estimation in Challenging Environments , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Thomas A. Dingus,et al.  An overview of the 100-car naturalistic study and findings , 2005 .

[11]  Carlos Busso,et al.  Use of Triplet-Loss Function to Improve Driving Anomaly Detection Using Conditional Generative Adversarial Network , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[12]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Carlos Busso,et al.  Calibration free, user-independent gaze estimation with tensor analysis , 2018, Image Vis. Comput..

[14]  Xingda Qu,et al.  Influence of traffic congestion on driver behavior in post-congestion driving. , 2020, Accident; analysis and prevention.

[15]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Carlos Busso,et al.  Probabilistic estimation of the driver's gaze from head orientation and position , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[17]  T. Dingus,et al.  Crash rates over time among younger and older drivers in the SHRP 2 naturalistic driving study. , 2020, Journal of Safety Research.

[18]  Xingda Qu,et al.  Drivers' visual scanning behavior at signalized and unsignalized intersections: A naturalistic driving study in China. , 2019, Journal of safety research.

[19]  Carlos Busso,et al.  Temporal Head Pose Estimation From Point Cloud in Naturalistic Driving Conditions , 2021, IEEE Transactions on Intelligent Transportation Systems.

[20]  John H. L. Hansen,et al.  Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems , 2007, INTERSPEECH.

[21]  Alain Pagani,et al.  AutoPOSE: Large-scale Automotive Driver Head Pose and Gaze Dataset with Deep Head Orientation Baseline , 2020, VISIGRAPP.

[22]  Carlos Busso,et al.  Detecting Drivers' Mirror-Checking Actions and Its Application to Maneuver and Secondary Task Recognition , 2016, IEEE Transactions on Intelligent Transportation Systems.

[23]  Rainer Stiefelhagen,et al.  DriveAHead — A Large-Scale Driver Head Pose Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Louis-Philippe Morency,et al.  Local-global ranking for facial expression intensity estimation , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[25]  Dariu M. Gavrila,et al.  DD-Pose - A large-scale Driver Head Pose Benchmark , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[26]  Rita Cucchiara,et al.  POSEidon: Face-from-Depth for Driver Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28]  Rafael F. Ribeiro,et al.  Driver Gaze Zone Dataset With Depth Data , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[29]  Alex Fridman,et al.  Driver Gaze Region Estimation without Use of Eye Movement , 2015, IEEE Intelligent Systems.

[30]  Bryan Reimer,et al.  MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction With Automation , 2017, IEEE Access.

[31]  Carlos Busso,et al.  Challenges in head pose estimation of drivers in naturalistic recordings using existing tools , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[32]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  L. de Silva,et al.  Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[35]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[36]  Carlos Busso,et al.  Robust Driver Head Pose Estimation in Naturalistic Conditions from Point-Cloud Data , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[37]  Mohan M. Trivedi,et al.  On driver gaze estimation: Explorations and fusion of geometric and data driven approaches , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[38]  Mohan M. Trivedi,et al.  Where is the driver looking: Analysis of head, eye and iris for robust gaze zone estimation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[39]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Carlos Busso,et al.  Driver Modeling for Detection and Assessment of Driver Distraction: Examples from the UTDrive Test Bed , 2017, IEEE Signal Processing Magazine.

[41]  Carlos Busso,et al.  Driving Anomaly Detection with Conditional Generative Adversarial Network using Physiological and CAN-Bus Data , 2019, ICMI.

[42]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Nicu Sebe,et al.  Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset , 2020, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[44]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[45]  Samy Noureldin,et al.  The Second Strategic Highway Research Program , 2013 .

[46]  Carlos Busso,et al.  Analyzing the relationship between head pose and gaze to model driver visual attention , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[47]  Guofa Li,et al.  A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles: Emotions Triggered by Video-Audio Clips in Driving Scenarios , 2020, IEEE Transactions on Affective Computing.

[48]  David Whitney,et al.  Predicting Driver Attention in Critical Situations , 2017, ACCV.

[49]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[50]  David J. Friedman The National Highway Traffic Safety Administration (NHTSA) has published its analysis of seat belt pretensioners and load limiters (Kahane, 2013). The Insurance Institute for Highway Safety (IIHS) welcomes the opportunity to comment on this report. , 2014 .

[51]  Mohan M. Trivedi,et al.  Driver Gaze Zone Estimation Using Convolutional Neural Networks: A General Framework and Ablative Analysis , 2018, IEEE Transactions on Intelligent Vehicles.

[52]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[53]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[54]  Carlos Busso,et al.  Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[55]  Mohan M. Trivedi,et al.  On the design and evaluation of robust head pose for visual user interfaces: algorithms, databases, and comparisons , 2012, AutomotiveUI.

[56]  Pei-Sung Lin,et al.  Naturalistic Driving Study: Field Data Collection , 2014 .

[57]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.