Facial Pose Estimation by Deep Learning from Label Distributions

Facial pose estimation has gained a lot of attentions in many practical applications, such as human-robot interaction, gaze estimation and driver monitoring. Meanwhile, end-to-end deep learning-based facial pose estimation is becoming more and more popular. However, facial pose estimation suffers from a key challenge: the lack of sufficient training data for many poses, especially for large poses. Inspired by the observation that the faces under close poses look similar, we reformulate the facial pose estimation as a label distribution learning problem, considering each face image as an example associated with a Gaussian label distribution rather than a single label, and construct a convolutional neural network which is trained with a multi-loss function on AFLW dataset and 300W-LP dataset to predict the facial poses directly from color image. Extensive experiments are conducted on several popular benchmarks, including AFLW2000, BIWI, AFLW and AFW, where our approach shows a significant advantage over other state-of-the-art methods.

[1]  Mohan M. Trivedi,et al.  Head Pose Estimation for Driver Assistance Systems: A Robust Algorithm and Experimental Evaluation , 2007, 2007 IEEE Intelligent Transportation Systems Conference.

[2]  Rama Chellappa,et al.  KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[3]  Sethuraman Panchanathan,et al.  Biased Manifold Embedding: A Framework for Person-Independent Head Pose Estimation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  William T. Freeman,et al.  Example-based head tracking , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Nec Labratories America ESTIMATING FACIAL POSE FROM A SPARSE REPRESENTATION , 2004 .

[8]  James M. Rehg,et al.  Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[12]  Luc Van Gool,et al.  Real Time Head Pose Estimation from Consumer Depth Cameras , 2011, DAGM-Symposium.

[13]  Zhi-Hua Zhou,et al.  Facial Age Estimation by Learning from Label Distributions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Rolf Baxter,et al.  An Adaptive Motion Model for Person Tracking with Instantaneous Head-Pose Features , 2015, IEEE Signal Processing Letters.

[15]  Shaogang Gong,et al.  Support vector machine based multi-view face detection and recognition , 2004, Image Vis. Comput..

[16]  Carlos D. Castillo,et al.  An All-In-One Convolutional Neural Network for Face Analysis , 2016, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[17]  Angelo Cangelosi,et al.  Priming Anthropomorphism: Can the credibility of humanlike robots be transferred to non-humanlike robots? , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Damon L. Woodard,et al.  Head pose estimation in the wild using approximate view manifolds , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Angelo Cangelosi,et al.  Head pose estimation in the wild using Convolutional Neural Networks and adaptive gradient methods , 2017, Pattern Recognit..

[20]  R. Stiefelhagen Estimating Head Pose with Neural Networks-Results on the Pointing 04 ICPR Workshop Evaluation Data , 2004 .

[21]  Shaogang Gong,et al.  Understanding Pose Discrimination in Similarity Space , 1999, BMVC.

[22]  Shaogang Gong,et al.  Composite support vector machines for detection of faces across views and pose estimation , 2002, Image Vis. Comput..

[23]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Heiko Neumann,et al.  Detection of Head Pose and Gaze Direction for Human-Computer Interaction , 2006, PIT.

[25]  Xin Geng,et al.  Crowd counting in public video surveillance by label distribution learning , 2015, Neurocomputing.

[26]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[27]  Katsuhiko Sakaue,et al.  Head pose estimation by nonlinear manifold learning , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[28]  Luc Van Gool,et al.  Real-time face pose estimation from single range images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Zezhou Chen,et al.  A Realistic Face-to-Face Conversation System Based on Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Xin Geng,et al.  Pre-release Prediction of Crowd Opinion on Movies by Label Distribution Learning , 2015, IJCAI.

[33]  Chin-Chen Chang,et al.  Face and Gesture Based Human Computer Interaction , 2015 .

[34]  Mario Fritz,et al.  Appearance-based gaze estimation in the wild , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Cheng Li,et al.  Pose-Robust Face Recognition via Deep Residual Equivariant Mapping , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Xin Geng,et al.  Head Pose Estimation Based on Multivariate Label Distribution , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Shiguo Lian,et al.  Video Synthesis of Human Upper Body with Realistic Face , 2019, 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[38]  Bo Li,et al.  Label Distribution-Based Facial Attractiveness Computation by Deep Residual Learning , 2016, IEEE Transactions on Multimedia.

[39]  Xin Liu,et al.  AgeNet: Deeply Learned Regressor and Classifier for Robust Apparent Age Estimation , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[40]  Peter Robinson,et al.  Face Alignment Assisted by Head Pose Estimation , 2015, BMVC.

[41]  David Gerónimo Gómez,et al.  Survey of Pedestrian Detection for Advanced Driver Assistance Systems , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ramakant Nevatia,et al.  FacePoseNet: Making a Case for Landmark-Free Face Alignment , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[43]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[44]  Neil Martin Robertson,et al.  Deep Head Pose: Gaze-Direction Estimation in Multimodal Video , 2015, IEEE Transactions on Multimedia.

[45]  Jean-Marc Odobez,et al.  Robust and Accurate 3D Head Pose Estimation through 3DMM and Online Head Model Reconstruction , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[46]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Rainer Stiefelhagen,et al.  Head pose estimation using stereo vision for human-robot interaction , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[49]  Hankyu Moon,et al.  Estimating facial pose from a sparse representation [face recognition applications] , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[50]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing , 1999, MULTIMEDIA '99.

[51]  James M. Rehg,et al.  Detecting Gaze Towards Eyes in Natural Social Interactions and Its Use in Child Assessment , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[52]  Mohan M. Trivedi,et al.  A two-stage head pose estimation framework and evaluation , 2008, Pattern Recognit..

[53]  Yuxiao Hu,et al.  Head Pose Estimation in Seminar Room Using Multi View Face Detectors , 2006, CLEAR.

[54]  Harry Wechsler,et al.  Face pose discrimination using support vector machines (SVM) , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[55]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[56]  Huei-Yung Lin,et al.  Human computer interaction using face and gesture recognition , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[57]  Shaogang Gong,et al.  Face distributions in similarity space under varying head pose , 2001, Image Vis. Comput..

[58]  Zezhou Chen,et al.  A Neural Virtual Anchor Synthesizer Based on Seq2Seq and GAN Models , 2019, 2019 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[59]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.