An Unsupervised Approach for 3D Face Reconstruction from a Single Depth Image

In this paper, we propose a convolutional encoder network to learn a mapping function from a noisy depth image to a 3D expressive facial model. We formulate the task as an embedding problem and train the network in an unsupervised manner by exploiting the consistent fitting of the 3D mesh and the depth image. We use the 3DMM-based representation and embed depth images to code vectors concerning facial identities, expressions, and poses. Without semantic textural cues from RGB images, we exploit geometric and contextual constraints in both the depth image and the 3D surface for reliable mapping. We combine the multi-level filtered point cloud pyramid and semantic adaptive weighting for fitting. The proposed system enables the 3D expressive face completion and reconstruction in poor illuminations by leveraging a single noisy depth image. The system realizes a full correspondence between the depth image and the 3D statistical deformable mesh, facilitating landmark location and feature segmentation of depth images.