POSEidon: Face-from-Depth for Driver Pose Estimation

Fast and accurate upper-body and head pose estimation is a key task for automatic monitoring of driver attention, a challenging context characterized by severe illumination changes, occlusions and extreme poses. In this work, we present a new deep learning framework for head localization and pose estimation on depth images. The core of the proposal is a regressive neural network, called POSEidon, which is composed of three independent convolutional nets followed by a fusion layer, specially conceived for understanding the pose by depth. In addition, to recover the intrinsic value of face appearance for understanding head position and orientation, we propose a new Face-from-Depth model for learning image faces from depth. Results in face reconstruction are qualitatively impressive. We test the proposed framework on two public datasets, namely Biwi Kinect Head Pose and ICT-3DHP, and on Pandora, a new challenging dataset mainly inspired by the automotive setup. Results show that our method overcomes all recent state-of-art works, running in real time at more than 30 frames per second.

[1]  Haibo Li,et al.  3D head pose estimation using the Kinect , 2011, 2011 International Conference on Wireless Communications and Signal Processing (WCSP).

[2]  Janusz Konrad,et al.  Estimating head pose orientation using extremely low resolution images , 2016, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[3]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Lijun Yin,et al.  Automatic pose estimation of 3D facial models , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[8]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[9]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[10]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[11]  Nicu Sebe,et al.  Robust Real-Time Extreme Head Pose Estimation , 2014, 2014 22nd International Conference on Pattern Recognition.

[12]  Rainer Stiefelhagen,et al.  Real Time Head Model Creation and Head Pose Estimation on Consumer Depth Cameras , 2014, 2014 2nd International Conference on 3D Vision.

[13]  In-So Kweon,et al.  Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network , 2014, ACCV.

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Alexander Zelinsky,et al.  An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[16]  Radu Horaud,et al.  Head pose estimation via probabilistic high-dimensional regression , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[17]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[18]  Michael Werman,et al.  Robust head pose estimation by fusing time-of-flight depth and color , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[19]  Peter Robinson,et al.  3D Constrained Local Model for rigid and non-rigid facial tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  Tobias Bär,et al.  Driver head pose and gaze estimation based on multi-template ICP 3-D point cloud alignment , 2012, 2012 15th International IEEE Conference on Intelligent Transportation Systems.

[22]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[24]  Wei Liang,et al.  3D head pose estimation with convolutional neural network trained on synthetic images , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[25]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[27]  Ruigang Yang,et al.  Model-based head pose tracking with stereovision , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[28]  Mohan M. Trivedi,et al.  Introducing “XMOB”: Extremity Movement Observation Framework for Upper Body Pose Tracking in 3D , 2009, 2009 11th IEEE International Symposium on Multimedia.

[29]  Mohan M. Trivedi,et al.  Vision for Driver Assistance: Looking at People in a Vehicle , 2011, Visual Analysis of Humans.

[30]  Jan Kautz,et al.  Robust Model-Based 3D Head Pose Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Ayoub Al-Hamadi,et al.  Boosted human head pose estimation using kinect camera , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[32]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[33]  Denis Laurendeau,et al.  Highly Accurate and Fully Automatic Head Pose Estimation from a Low Quality Consumer-Level RGB-D Sensor , 2015, HCMC '15.

[34]  Miguel Ángel Sotelo,et al.  Real-time system for monitoring driver vigilance , 2004, Proceedings of the IEEE International Symposium on Industrial Electronics, 2005. ISIE 2005..

[35]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Kun Zhou,et al.  3D shape regression for real-time facial animation , 2013, ACM Trans. Graph..

[37]  Andreas Kolb,et al.  Kinect range sensing: Structured-light versus Time-of-Flight Kinect , 2015, Comput. Vis. Image Underst..

[38]  Michael G. Strintzis,et al.  Robust real-time 3D head pose estimation from range data , 2005, Pattern Recognit..

[39]  Kok-Lim Low Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration , 2004 .

[40]  Takeo Kanade,et al.  Predicting driver operations inside vehicles , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[41]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[42]  Adam Strupczewski,et al.  High Accuracy Head Pose Tracking Survey , 2014, AMT.

[43]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[44]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[45]  Takeo Kanade,et al.  Linear motion estimation for systems of articulated planes , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Sven Behnke,et al.  Feature-based head pose estimation from images , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[47]  Walid Mahdi,et al.  3D Face Pose Tracking using Low Quality Depth Cameras , 2013, VISAPP.

[48]  Wei Liang,et al.  Face pose estimation with combined 2D and 3D HOG features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[49]  G. Farneback Very high accuracy velocity estimation using orientation tensors, parametric motion, and simultaneous segmentation of the motion field , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[50]  Mohan M. Trivedi,et al.  Occupant posture analysis with stereo and thermal infrared video: algorithms and experimental evaluation , 2004, IEEE Transactions on Vehicular Technology.

[51]  Michael J. Jones,et al.  Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  M. Trivedi,et al.  Head and eye gaze dynamics during visual attention shifts in complex environments. , 2012, Journal of vision.

[54]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[55]  Zhengyou Zhang,et al.  3D Deformable Face Tracking with a Commodity Depth Camera , 2010, ECCV.

[56]  Antonis A. Argyros,et al.  Head pose estimation on depth data based on Particle Swarm Optimization , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[57]  Rainer Stiefelhagen,et al.  Head pose estimation using stereo vision for human-robot interaction , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..