论文信息 - Speech-driven cartoon animation with emotions

Speech-driven cartoon animation with emotions

In this paper, we present a cartoon face animation system for multimedia HCI applications. We animate face cartoons not only from input speech, but also based on emotions derived from speech signal. Using a corpus of over 700 utterances from different speakers, we have trained SVMs (support vector machines) to recognize four categories of emotions: neutral, happiness, anger and sadness. Given each input speech phrase, we identify its emotion content as a mixture of all four emotions, rather than classifying it into a single emotion. Then, facial expressions are= generated from the recovered emotion for each phrase, by morphing different cartoon templates that correspond to various emotions. To ensure smooth transitions in the animation, we apply low-pass filtering to the recovered (and possibly jumpy) emotion sequence. Moreover, lip-syncing is applied to produce the lip movement from speech, by recovering a statistical audio-visual mapping. Experimental results demonstrate that cartoon animation sequences generated by our system are of good and convincing quality.

[1] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2] Chao Huang,et al. Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.

[3] Tsuhan Chen,et al. Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[4] Mark Steedman,et al. Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[5] Astrid Paeschke,et al. Prosodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements , 2000 .

[6] M. Carter. Computer graphics: Principles and practice , 1997 .

[7] Valery A. Petrushin,et al. Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[8] Ryohei Nakatsu,et al. Emotion recognition and its application to computer agents with spontaneous interactive capabilities , 1999, MULTIMEDIA '99.

[9] Michele Covell,et al. Eigen-points: control-point location using principal component analyses , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[10] Kiyoharu Aizawa,et al. An intelligent facial image coding driven by speech and phoneme , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.

[12] Thoms M. Levergood,et al. DEC face: an automatic lip-synchronization algorithm for synthetic faces , 1993 .

[13] Han Noot,et al. Animated CharToon faces , 2000, NPAR '00.

[14] James C. Miller,et al. Computer graphics principles and practice, second edition , 1992, Comput. Graph..

[15] David Salesin,et al. Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[16] Frank Dellaert,et al. Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[18] Thaddeus Beier,et al. Feature-based image metamorphosis , 1992, SIGGRAPH.