Extracting 3D facial animation parameters from multiview video clips - Computer Graphics and Applications, IEEE

synthetic face’s behaviors must precisely conform to those of a real one. However, facial surface points, being nonlinear and without rigid body properties, have quite complex action relations. During speaking and pronunciation, facial motion trajectories between articulations, called coarticulation effects, also prove nonlinear and depend on preceding and succeeding articulations. Performance-driven facial animation provides a direct and convincing approach to handling delicate human facial variations. This method animates a synthetic face using motion data captured from a performer. In modern computer graphics-based movies such as Final Fantasy, Shrek, and Toy Story, character motion designers used optical or magnetic motion trackers to capture markers’ 3D motion trajectories on a performer’s face. They can track only a limited number of markers without interference, however, and the dozen or so markers they can place on facial feature points only sparsely cover the whole face area. Therefore, to derive a vivid facial animation, animators must adjust for the uncovered areas. Other approaches, discussed in the “Related Work” sidebar, also present limitations in analyzing and synthesizing facial motion. To tackle this problem, we propose an accurate and inexpensive procedure that estimates 3D facial motion parameters from mirror-reflected multiview video clips. We place two planar mirrors near a subject’s cheeks and use a single camera to simultaneously capture markers’ front and side view images. We also propose a novel closed-form linear algorithm to reconstruct 3D positions from real versus mirrored point correspondences in an uncalibrated environment. Figure 1 shows such a reconstruction.

[1]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[3]  Peter C. Litwinowicz,et al.  Facial Animation by Spatial Mapping , 1991 .

[4]  Thomas S. Huang,et al.  Motion and Structure from Image Sequences , 1992 .

[5]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[6]  Daniel Thalmann,et al.  Models and Techniques in Computer Animation , 2014, Computer Animation Series.

[7]  Gregory M. Nielson,et al.  Scattered data modeling , 1993, IEEE Computer Graphics and Applications.

[8]  Thomas S. Huang,et al.  Motion and structure from feature correspondences: a review , 1994, Proc. IEEE.

[9]  A. Murat Tekalp,et al.  Simultaneous stereo-motion fusion and 3-D motion tracking , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Alex Pentland,et al.  A three-dimensional model of human lip motions trained from video , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[11]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[12]  Shree K. Nayar,et al.  Planar catadioptric stereo: geometry and calibration , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[13]  Shree K. Nayar,et al.  Rectified catadioptric stereo sensors , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).