HeadFusion: 360° Head Pose Tracking Combining 3D Morphable Model and 3D Reconstruction

Head pose estimation is a fundamental task for face and social related research. Although 3D morphable model (3DMM) based methods relying on depth information usually achieve accurate results, they usually require frontal or mid-profile poses which preclude a large set of applications where such conditions can not be garanteed, like monitoring natural interactions from fixed sensors placed in the environment. A major reason is that 3DMM models usually only cover the face region. In this paper, we present a framework which combines the strengths of a 3DMM model fitted online with a prior-free reconstruction of a 3D full head model providing support for pose estimation from any viewpoint. In addition, we also proposes a symmetry regularizer for accurate 3DMM fitting under partial observations, and exploit visual tracking to address natural head dynamics with fast accelerations. Extensive experiments show that our method achieves state-of-the-art performance on the public BIWI dataset, as well as accurate and robust results on UbiPose, an annotated dataset of natural interactions that we make public and where adverse poses, occlusions or fast motions regularly occur.

[1]  Thomas Vetter,et al.  Estimating Coloured 3D Face Models from Single Images: An Example Based Approach , 1998, ECCV.

[2]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[3]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rin-ichiro Taniguchi,et al.  Augmented Blendshapes for Real-Time Simultaneous 3D Head Modeling and Facial Motion Capture , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jean-Marc Odobez,et al.  Gaze estimation from multimodal Kinect data , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[7]  Radek Grzeszczuk,et al.  A data-driven model for monocular face tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Xiaoming Liu,et al.  Large-Pose Face Alignment via CNN-Based Dense 3D Model Fitting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[10]  Mario Fritz,et al.  It’s Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[12]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[13]  Jean-Marc Odobez,et al.  Gaze Estimation in the 3D Space Using RGB-D Sensors , 2015, International Journal of Computer Vision.

[14]  David Cristinacce,et al.  Automatic feature localisation with constrained local models , 2008, Pattern Recognit..

[15]  Jean-Marc Odobez,et al.  Robust and Accurate 3D Head Pose Estimation through 3DMM and Online Head Model Reconstruction , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[16]  Yuning Jiang,et al.  Extensive Facial Landmark Localization with Coarse-to-Fine Convolutional Network Cascade , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[17]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[18]  Josephine Sullivan,et al.  One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.