DriveAHead — A Large-Scale Driver Head Pose Dataset

Head pose monitoring is an important task for driver assistance systems, since it is a key indicator for human attention and behavior. However, current head pose datasets either lack complexity or do not adequately represent the conditions that occur while driving. Therefore, we introduce DriveAHead, a novel dataset designed to develop and evaluate head pose monitoring algorithms in real driving conditions. We provide frame-by-frame head pose labels obtained from a motion-capture system, as well as annotations about occlusions of the driver's face. To the best of our knowledge, DriveAHead is the largest publicly available driver head pose dataset, and also the only one that provides 2D and 3D data aligned at the pixel level using the Kinect v2. Existing performance metrics are based on the mean error without any consideration of the bias towards one position or another. Here, we suggest a new performance metric, named Balanced Mean Angular Error, that addresses the bias towards the forward looking position existing in driving datasets. Finally, we present the Head Pose Network, a deep learning model that achieves better performance than current state-of-the-art algorithms, and we analyze its performance when using our dataset.

[1]  Mohan M. Trivedi,et al.  Continuous Head Movement Estimator for Driver Assistance: Issues, Algorithms, and On-Road Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[2]  Rainer Stiefelhagen,et al.  HeHOP: Highly efficient head orientation and position estimation , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Horst Bischof,et al.  Alternating Regression Forests for Object Detection and Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Jing Xiao,et al.  Multi-View AAM Fitting and Construction , 2008, International Journal of Computer Vision.

[6]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[8]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[9]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[10]  Thomas S. Huang,et al.  Interactive Facial Feature Localization , 2012, ECCV.

[11]  J. Crowley,et al.  Estimating Face orientation from Robust Detection of Salient Facial Structures , 2004 .

[12]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[14]  Mohan M. Trivedi,et al.  On the design and evaluation of robust head pose for visual user interfaces: algorithms, databases, and comparisons , 2012, AutomotiveUI.

[15]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Horst Bischof,et al.  Hough Networks for Head Pose Estimation and Facial Feature Localization , 2014, BMVC.

[17]  Rainer Stiefelhagen,et al.  Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras , 2014, 2014 2nd International Conference on 3D Vision.

[18]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Stefanos Zafeiriou,et al.  300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[20]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[21]  Nassir Navab,et al.  A Combined Generalized and Subject-Specific 3D Head Pose Estimation , 2015, 2015 International Conference on 3D Vision.

[22]  In-So Kweon,et al.  Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[23]  Lucas Beyer,et al.  Biternion Nets: Continuous Head Pose Regression from Discrete Training Labels , 2015, GCPR.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Shiguang Shan,et al.  Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Luc Van Gool,et al.  Head Pose Estimation from Passive Stereo Images , 2009, SCIA.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[33]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[34]  Chi Fang,et al.  Head Pose Estimation Based on Random Forests for Multiclass Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[35]  David J. Kriegman,et al.  Localizing parts of faces using a consensus of exemplars , 2011, CVPR.

[36]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Tinne Tuytelaars,et al.  All together now: Simultaneous Detection and Continuous Pose Estimation using a Hough Forest with Probabilistic Locally Enhanced Voting , 2014, BMVC.

[38]  Rafael Cabeza,et al.  A novel 2D/3D database with automatic face annotation for head tracking and pose estimation , 2016, Comput. Vis. Image Underst..

[39]  Reinhard Klette,et al.  Look at the Driver, Look at the Road: No Distraction! No Accident! , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.