论文信息 - Binocular photometric stereo acquisition and reconstruction for 3d talking head applications - 字舞流文

Binocular photometric stereo acquisition and reconstruction for 3d talking head applications

In order to render a high quality, versatile 3D talking head, a stable, high frame rate AV data acquisition system is constructed. It can capture 3D position, surface orientation and albedo texture of the talking head video images along with the corresponding speech signals. The system consists of a computer controlled LED lighting subsystem; high speed stereo cameras; a microphone; and a computer for synchronous recording of multi-stream AV data. The visual image data collected is processed through a binocular photometric stereo 3D reconstruction pipeline. The pipeline automatically segments out the face; computes the depth map with binocular stereo; computes the normal map with photometric stereo; generates albedo texture; and finally constructs a high-detailed 3d model with depth and normal cues as constraints. By using the data collected with the built system, we can capture high quality dynamic facial performance, synchronized with the subject’s uttered speech.

Yasuyuki Matsushita | Frank K. Soong | Lijuan Wang | Bojun Huang | Chaoyang Wang | Magnetro Chen | Y. Matsushita | Lijuan Wang | F. Soong | Chaoyang Wang | Magnetro Chen | Bojun Huang

[1] Thabo Beeler,et al. High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[2] Li Zhang,et al. Spacetime faces: high resolution capture for modeling and animation , 2004, SIGGRAPH 2004.

[3] David J. Kriegman,et al. Shape from Varying Illumination and Viewpoint , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4] Pieter Peers,et al. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination , 2007 .

[5] Hans Peter Graf,et al. Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[6] Yasuyuki Matsushita,et al. High-quality shape from multi-view stereo and shading under general illumination , 2011, CVPR 2011.

[7] Paul A. Beardsley,et al. High-quality passive facial performance capture using anchor frames , 2011, SIGGRAPH 2011.

[8] Frank K. Soong,et al. Synthesizing photo-real talking head via trajectory-guided sample selection , 2010, INTERSPEECH.

[9] Ming Ouhyoung,et al. Mirror MoCap: Automatic and efficient capture of dense 3D facial motion parameters from video , 2005, The Visual Computer.

[10] Moshe Ben-Ezra,et al. Photometric Stereo for Dynamic Surface Orientations , 2010, ECCV.

[11] Katsushi Ikeuchi,et al. Determining a Depth Map Using a Dual Photometric Stereo , 1987 .

[12] Tomaso Poggio,et al. Trainable Videorealistic Speech Animation , 2004, FGR.

[13] Roberto Cipolla,et al. Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[14] Hans-Peter Seidel,et al. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data , 2009, TOGS.

[15] George Vogiatzis,et al. Self-calibrating a real-time monocular 3 d facial capture system , 2010 .

[16] Robert J. Woodham,et al. Photometric method for determining surface orientation from multiple images , 1980 .

[17] Hideki Hayakawa. Photometric stereo under a light source with arbitrary motion , 1994 .

[18] Gang Chen,et al. Computer-Assisted Audiovisual Language Learning , 2012, Computer.

[19] Vladimir Kolmogorov,et al. Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Jean-Yves Bouguet,et al. Camera calibration toolbox for matlab , 2001 .

[21] Diego F. Nehab,et al. Efficiently combining positions and normals for precise 3D geometry , 2005, SIGGRAPH 2005.

[22] Frank K. Soong,et al. High quality lip-sync animation for 3D photo-realistic talking head , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Wojciech Matusik,et al. Multi-scale capture of facial geometry and motion , 2007, ACM Trans. Graph..

[24] Jean Ponce,et al. Dense 3D motion capture for human faces , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Frank K. Soong,et al. A real-time text to audio-visual speech synthesis system , 2008, INTERSPEECH.

[26] Derek Bradley,et al. Markerless garment capture , 2008, ACM Trans. Graph..

[27] Jian Sun,et al. Guided Image Filtering , 2010, ECCV.