Three-dimensional face processing and its applications in biometrics

Face analysis research is evolving from two-dimensional (2D) to three-dimensional (3D) and from static to dynamic, where multisample, multimodality and multiview methods improve the accuracy of both hard and soft biometric tasks including face recognition and facial expression recognition. In this work, hardware/software systems are designed and implemented to build an extensive 3D static/dynamic database. Based on the collected data, a fully automatic 3D face reconstruction algorithm is proposed to recover the 3D face information including both geometry and texture. The resulting 3D face model can be driven by Moving Picture Experts Group-4 (MPEG-4) facial animation parameters to render an emotive speech avatar. Thousands of virtual face images with different pose, illumination and expressions can also be synthesized to enlarge the training set for robust face recognition. We evaluate the reconstruction result by multiview face recognition and show its effectiveness by experiments. Multiview facial expression recognition based on 2D/3D face data is also examined and improved by combining shape and texture information. The spatial and temporal ambiguity in the dynamic data is addressed by spatial-temporal features and multiple instance learning. Our experiments on spontaneous facial expression recognition and human action recognition have shown promising results. The contributions of this thesis include: (1) large-scale 3D data collection and data annotation, which resulted in the largest static 3D face database and the first audiovisual 3D dynamic face data collection system in the world; (2) a robust, fast and accurate 3D face reconstruction and animation algorithm, which can be used to create an emotive speech avatar; and (3) the non-frontal-view face/facial expression recognition/human action recognition, which is among the earliest efforts of the research communities in this area.