High-Detail 3D Capture and Non-sequential Alignment of Facial Performance

This paper presents a novel system for the 3D capture of facial performance using standard video and lighting equipment. The mesh of an actor's face is tracked non-sequentially throughout a performance using multi-view image sequences. The minimum spanning tree calculated in expression dissimilarity space defines the traversal of the sequences optimal with respect to error accumulation. A robust patch-based frame-to-frame surface alignment combined with the optimal traversal significantly reduces drift compared to previous sequential techniques. Multi-path temporal fusion resolves inconsistencies between different alignment paths and yields a final mesh sequence which is temporally consistent. The surface tracking framework is coupled with photometric stereo using colour lights which captures metrically correct skin geometry. High-detail UV normal maps corrected for shadow and bias artefacts augment the temporally consistent mesh sequence. Evaluation on challenging performances by several actors demonstrates the acquisition of subtle skin dynamics and minimal drift over long sequences. A quantitative comparison to a state-of-the-art system shows similar quality of temporal alignment.

[1]  George Vogiatzis,et al.  Self-calibrated, Multi-spectral Photometric Stereo for 3D Face Capture , 2012, International Journal of Computer Vision.

[2]  Wan-Chun Ma,et al.  Comprehensive Facial Performance Capture , 2011, Comput. Graph. Forum.

[3]  Wojciech Matusik,et al.  Multi-scale capture of facial geometry and motion , 2007, ACM Trans. Graph..

[4]  Martin Klaudiny,et al.  Global Non-rigid Alignment of Surface Sequences , 2013, International Journal of Computer Vision.

[5]  Barry-John Theobald,et al.  Robust facial feature tracking using selected multi-resolution linear predictors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Jean Ponce,et al.  Dense 3D motion capture for human faces , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Wolfgang Heidrich,et al.  Globally Consistent Space‐Time Reconstruction , 2010, Comput. Graph. Forum.

[8]  Pieter Peers,et al.  Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination , 2007 .

[9]  Pieter Peers,et al.  Temporal upsampling of performance geometry using photometric alignment , 2010, TOGS.

[10]  Sébastien Roy,et al.  Stereo Without Epipolar Lines: A Maximum-Flow Formulation , 1999, International Journal of Computer Vision.

[11]  Ming Ouhyoung,et al.  Mirror MoCap: Automatic and efficient capture of dense 3D facial motion parameters from video , 2005, The Visual Computer.

[12]  Diego F. Nehab,et al.  Efficiently combining positions and normals for precise 3D geometry , 2005, SIGGRAPH 2005.

[13]  Ahmed M. Elgammal,et al.  High Resolution Acquisition, Learning and Transfer of Dynamic 3‐D Facial Expressions , 2004, Comput. Graph. Forum.

[14]  Steven M. Seitz,et al.  Spacetime faces , 2004, ACM Trans. Graph..

[15]  Luc Van Gool,et al.  Face/Off: live facial puppetry , 2009, SCA '09.

[16]  Pieter Peers,et al.  Facial performance synthesis using deformation-driven polynomial displacement maps , 2008, SIGGRAPH Asia '08.

[17]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[18]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[19]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, SIGGRAPH 2005.

[20]  Roberto Cipolla,et al.  Shadows in Three-Source Photometric Stereo , 2008, ECCV.

[21]  Derek Bradley,et al.  High resolution passive facial performance capture , 2010, ACM Trans. Graph..

[22]  Björn Stenger,et al.  Video Normals from Colored Lights , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Martin Klaudiny,et al.  Cooperative patch-based 3D surface tracking , 2011, 2011 Conference for Visual Media Production.