Unsupervised learning for speech motion editing

We present a new method for editing speech related facial motions. Our method uses an unsupervised learning technique, Independent Component Analysis (ICA), to extract a set of meaningful parameters without any annotation of the data. With ICA, we are able to solve a blind source separation problem and describe the original data as a linear combination of two sources. One source captures content (speech) and the other captures style (emotion). By manipulating the independent components we can edit the motions in intuitive ways.

[1]  Christoph Bregler,et al.  Motion capture assisted animation: texturing and synthesis , 2002, ACM Trans. Graph..

[2]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[3]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[4]  J. N. Bassili Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. , 1979, Journal of personality and social psychology.

[5]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[6]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[8]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[9]  Alex Pentland,et al.  A vision system for observing and extracting facial action parameters , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[11]  Alex Pentland,et al.  Modeling, tracking and interactive animation of faces and heads//using input from video , 1996, Proceedings Computer Animation '96.

[12]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[13]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[14]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[15]  John Lewis,et al.  Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[16]  Takeo Kanade,et al.  Automated facial expression recognition based on FACS action units , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[17]  Frederic I. Parke,et al.  A model for human faces that allows speech synchronized animation , 1974, SIGGRAPH '74.

[18]  Christoph Bregler,et al.  Facial expression space learning , 2002, 10th Pacific Conference on Computer Graphics and Applications, 2002. Proceedings..

[19]  Ken-ichi Anjyo,et al.  Fourier principles for emotion-based human figure animation , 1995, SIGGRAPH.

[20]  Junichi Hoshino,et al.  Independent component analysis and synthesis of human motion , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Takeo Kanade,et al.  Feature-point tracking by optical flow discriminates subtle differences in facial expression , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[22]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[23]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[24]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.