Representation Analysis and Synthesis of Lip Images Using Dimensionality Reduction

Understanding facial expressions in image sequences is an easy task for humans. Some of us are capable of lipreading by interpreting the motion of the mouth. Automatic lipreading by a computer is a challenging task, with so far limited success. The inverse problem of synthesizing real looking lip movements is also highly non-trivial. Today, the technology to automatically generate an image series that imitates natural postures is far from perfect.We introduce a new framework for facial image representation, analysis and synthesis, in which we focus just on the lower half of the face, specifically the mouth. It includes interpretation and classification of facial expressions and visual speech recognition, as well as a synthesis procedure of facial expressions that yields natural looking mouth movements.Our image analysis and synthesis processes are based on a parametrization of the mouth configuration set of images. These images are represented as points on a two-dimensional flat manifold that enables us to efficiently define the pronunciation of each word and thereby analyze or synthesize the motion of the lips. We present some examples of automatic lips motion synthesis and lipreading, and propose a generalization of our solution to the problem of lipreading different subjects.

[1]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[2]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[3]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[4]  Larry S. Davis,et al.  Recognizing Human Facial Expressions From Long Image Sequences Using Optical Flow , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  L. SchwartzE.,et al.  A Numerical Solution to the Generalized Mapmaker's Problem , 1989 .

[7]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[8]  Luc Van Gool,et al.  From speech to 3D face animation , 2002 .

[9]  C. G. Fisher,et al.  Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.

[10]  Mubarak Shah,et al.  VISUALLY RECOGNIZING SPEECH USING EIGENSEQUENCES , 1997 .

[11]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[12]  Ronen Basri,et al.  Comparing images under variable illumination , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Eric L. Schwartz,et al.  A Numerical Solution to the Generalized Mapmaker's Problem: Flattening Nonconvex Polyhedral Surfaces , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[16]  Samuel W. Malone,et al.  Optimal Dilations for Metric Multidimensional Scaling , 2000 .

[17]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[18]  Juergen Luettin,et al.  Visual Speech and Speaker Recognition , 1997 .

[19]  Shmuel Peleg,et al.  A Three-Frame Algorithm for Estimating Two-Component Image Motion , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[21]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[22]  Christoph Bregler,et al.  Computer Vision for Human–Machine Interaction: Probabilistic Models of Verbal and Body Gestures , 1998 .

[23]  Luc J. Van Gool,et al.  Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[24]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[25]  Alexander H. Waibel,et al.  Toward movement-invariant automatic lip-reading and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.