论文信息 - Unsupervised learning for speech motion editing

Unsupervised learning for speech motion editing

We present a new method for editing speech related facial motions. Our method uses an unsupervised learning technique, Independent Component Analysis (ICA), to extract a set of meaningful parameters without any annotation of the data. With ICA, we are able to solve a blind source separation problem and describe the original data as a linear combination of two sources. One source captures content (speech) and the other captures style (emotion). By manipulating the independent components we can edit the motions in intuitive ways.

Frédéric H. Pighin | Yong Cao | Petros Faloutsos

[1] Christoph Bregler,et al. Motion capture assisted animation: texturing and synthesis , 2002, ACM Trans. Graph..

[2] Michael J. Black,et al. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[3] Shane S. Sturrock,et al. Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[4] J. N. Bassili. Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. , 1979, Journal of personality and social psychology.

[5] Demetri Terzopoulos,et al. Realistic modeling for facial animation , 1995, SIGGRAPH.

[6] Alex Pentland,et al. Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.

[8] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[9] Alex Pentland,et al. A vision system for observing and extracting facial action parameters , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10] Aapo Hyvärinen,et al. Topographic Independent Component Analysis , 2001, Neural Computation.

[11] Alex Pentland,et al. Modeling, tracking and interactive animation of faces and heads//using input from video , 1996, Proceedings Computer Animation '96.

[12] Aaron Hertzmann,et al. Style machines , 2000, SIGGRAPH 2000.

[13] Keith Waters,et al. A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[14] Tony Ezzat,et al. Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[15] John Lewis,et al. Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[16] Takeo Kanade,et al. Automated facial expression recognition based on FACS action units , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[17] Frederic I. Parke,et al. A model for human faces that allows speech synchronized animation , 1974, SIGGRAPH '74.

[18] Christoph Bregler,et al. Facial expression space learning , 2002, 10th Pacific Conference on Computer Graphics and Applications, 2002. Proceedings..

[19] Ken-ichi Anjyo,et al. Fourier principles for emotion-based human figure animation , 1995, SIGGRAPH.

[20] Junichi Hoshino,et al. Independent component analysis and synthesis of human motion , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21] Takeo Kanade,et al. Feature-point tracking by optical flow discriminates subtle differences in facial expression , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[22] Aapo Hyvärinen,et al. Survey on Independent Component Analysis , 1999 .

[23] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[24] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.