论文信息 - Voice puppetry

Voice puppetry

We introduce a method for predicting a control signal from another related signal, and apply it to voice puppetry: Generating full facial animation from expressive information in an audio track. The voice puppet learns a facial control model from computer vision of real facial behavior, automatically incorporating vocal and facial dynamics such as co-articulation. Animation is produced by using audio to drive the model, which induces a probability distribution over the manifold of possible facial motions. We present a lineartime closed-form solution for the most probable trajectory over this manifold. The output is a series of facial control parameters, suitable for driving many different kinds of animation ranging from video-realistic image warps to 3D cartoon characters. CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Animation; I.2.9 [Artificial Intelligence]: Robotics—Kinematics and Dynamics; I.4.8 [Image Processing and Computer Vision]: Scene Analysis—Time-varying images; G.3 [Mathematics of Computing]: Probability and Statistics—Time series analysis; E.4 [Data]: Coding and Information Theory—Data compaction and compression; J.5 [Computer Applications]: Arts and Humanities—Performing Arts

Matthew Brand | M. Brand

[1] L. Baum,et al. An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[2] Frederic I. Parke,et al. A model for human faces that allows speech synchronized animation , 1974, SIGGRAPH '74.

[3] Frederic I. Parke,et al. A parametric model for human faces. , 1974 .

[4] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.

[5] Gene H. Golub,et al. Matrix computations , 1983 .

[6] John Lewis,et al. Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[7] Hiroshi Harashima,et al. A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface , 1991, IEEE J. Sel. Areas Commun..

[8] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[9] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[10] Fabio Lavagetto,et al. LIP movements synthesis using time delay neural networks , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[11] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[12] Tsuhan Chen,et al. Audio-visual interaction in multimedia communication , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Gregory D. Hager,et al. X Vision: A Portable Substrate for Real-Time Vision Applications , 1998, Comput. Vis. Image Underst..

[14] Satoshi Nakamura,et al. Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[15] Tony Ezzat,et al. MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[16] Matthew Brand,et al. Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17] Dan Ling,et al. Spoken Language Processing in the Persona Conversational Assistant , 1999 .

[18] Matthew Brand,et al. Pattern discovery via entropy minimization , 1999, AISTATS.

[19] Matthew Brand,et al. Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.