The Multimodal Driver Monitoring Database: A Naturalistic Corpus to Study Driver Attention

A smart vehicle should be able to monitor the actions and behaviors of the human driver to provide critical warnings or intervene when necessary. Recent advancements in deep learning and computer vision have shown great promise in monitoring human behaviors and activities. While these algorithms work well in a controlled environment, naturalistic driving conditions add new challenges such as illumination variations, occlusions and extreme head poses. A vast amount of in-domain data is required to train models that provide high performance in predicting driving related tasks to effectively monitor driver actions and behaviors. Toward building the required infrastructure, this paper presents the multimodal driver monitoring (MDM) dataset, which was collected with 59 subjects that were recorded performing various tasks. We use the FiCap device that continuously tracks the head movement of the driver using fiducial markers, providing frame-based annotations to train head pose algorithms in naturalistic driving conditions. We ask the driver to look at predetermined gaze locations to obtain accurate correlation between the driver’s facial image and visual attention. We also collect data when the driver performs common secondary activities such as navigation using a smart phone and operating the in-car infotainment system. All of the driver’s activities are recorded with high definition RGB cameras and time-of-flight depth camera. We also record the controller area network-bus (CAN-Bus), extracting important information. These high quality recordings serve as the ideal resource to train various efficient algorithms for monitoring the driver, providing further advancements in the field of in-vehicle safety systems.

[1]  Carlos Busso,et al.  Detecting Drivers' Mirror-Checking Actions and Its Application to Maneuver and Secondary Task Recognition , 2016, IEEE Transactions on Intelligent Transportation Systems.

[2]  Thomas A. Dingus,et al.  An overview of the 100-car naturalistic study and findings , 2005 .

[3]  Rafael F. Ribeiro,et al.  Driver Gaze Zone Dataset With Depth Data , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[4]  Samy Noureldin,et al.  The Second Strategic Highway Research Program , 2013 .

[5]  Rainer Stiefelhagen,et al.  DriveAHead — A Large-Scale Driver Head Pose Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Carlos Busso,et al.  Analyzing the relationship between head pose and gaze to model driver visual attention , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[7]  Rita Cucchiara,et al.  POSEidon: Face-from-Depth for Driver Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Carlos Busso,et al.  Robust Driver Head Pose Estimation in Naturalistic Conditions from Point-Cloud Data , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[9]  Carlos Busso,et al.  Estimation of Driver's Gaze Region from Head Position and Orientation using Probabilistic Confidence Regions , 2020, ArXiv.

[10]  Carlos Busso,et al.  Estimation of Gaze Region Using Two Dimensional Probabilistic Maps Constructed Using Convolutional Neural Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Carlos Busso,et al.  Driver Modeling for Detection and Assessment of Driver Distraction: Examples from the UTDrive Test Bed , 2017, IEEE Signal Processing Magazine.

[12]  Mohan M. Trivedi,et al.  On the design and evaluation of robust head pose for visual user interfaces: algorithms, databases, and comparisons , 2012, AutomotiveUI.

[13]  Mohan M. Trivedi,et al.  Driver Gaze Zone Estimation Using Convolutional Neural Networks: A General Framework and Ablative Analysis , 2018, IEEE Transactions on Intelligent Vehicles.

[14]  Carlos Busso,et al.  Calibration free, user-independent gaze estimation with tensor analysis , 2018, Image Vis. Comput..

[15]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[17]  Alex Fridman,et al.  Driver Gaze Region Estimation without Use of Eye Movement , 2015, IEEE Intelligent Systems.

[18]  Dariu M. Gavrila,et al.  DD-Pose - A large-scale Driver Head Pose Benchmark , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[19]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[20]  David J. Friedman The National Highway Traffic Safety Administration (NHTSA) has published its analysis of seat belt pretensioners and load limiters (Kahane, 2013). The Insurance Institute for Highway Safety (IIHS) welcomes the opportunity to comment on this report. , 2014 .

[21]  Carlos Busso,et al.  Challenges in head pose estimation of drivers in naturalistic recordings using existing tools , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[22]  L. de Silva,et al.  Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[23]  Carlos Busso,et al.  Probabilistic Estimation of the Gaze Region of the Driver using Dense Classification , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[24]  Carlos Busso,et al.  FI-CAP: Robust Framework to Benchmark Head Pose Estimation in Challenging Environments , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[25]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  Nicu Sebe,et al.  Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver Gaze Zone Estimation Dataset , 2020, ArXiv.

[27]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Carlos Busso,et al.  Probabilistic estimation of the driver's gaze from head orientation and position , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[30]  Mohan M. Trivedi,et al.  Where is the driver looking: Analysis of head, eye and iris for robust gaze zone estimation , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[31]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Carlos Busso,et al.  Driving Anomaly Detection with Conditional Generative Adversarial Network using Physiological and CAN-Bus Data , 2019, ICMI.

[33]  John H. L. Hansen,et al.  Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems , 2007, INTERSPEECH.

[34]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[36]  T. Dingus,et al.  Crash rates over time among younger and older drivers in the SHRP 2 naturalistic driving study. , 2020, Journal of Safety Research.

[37]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[38]  Mohan M. Trivedi,et al.  Continuous Head Movement Estimator for Driver Assistance: Issues, Algorithms, and On-Road Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[39]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Thao Dang,et al.  Maneuver recognition using probabilistic finite-state machines and fuzzy logic , 2010, 2010 IEEE Intelligent Vehicles Symposium.

[41]  Mohan M. Trivedi,et al.  On driver gaze estimation: Explorations and fusion of geometric and data driven approaches , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[42]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[43]  Louis-Philippe Morency,et al.  Local-global ranking for facial expression intensity estimation , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[44]  Bryan Reimer,et al.  MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction With Automation , 2017, IEEE Access.

[45]  A. Sathyanarayana,et al.  UTDrive: Driver Behavior and Speech Interactive Systems for In-Vehicle Environments , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[46]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).