论文信息 - Model-based lip synchronization with automatically translated synthetic voice toward a multi-modal translation system

Model-based lip synchronization with automatically translated synthetic voice toward a multi-modal translation system

In this paper, we introduce a multi-modal English-to-Japanese and Japanese-to-English translation system that also translates the speaker's speech motion while synchronizing it to the translated speech. To retain the speaker's facial expression, we substitute only the speech organ's image with the synthesized one, which is made by a three-dimensional wire-frame model that is adaptable to any speaker. Our approach enables image synthesis and translation with an extremely small database.

Satoshi Nakamura | Shigeo Morishima | Kazumasa Murai | Shin Ogata

[1] Shigeo Morishima,et al. 3D Lip Expression Generation by using New Lip Parameters , 2000 .

[2] Satoshi Nakamura,et al. Automatic Face Tracking And Model Match-Move In Video Sequence Using 3d Face Model , 2001, ICME.

[3] Hitoshi Iida,et al. A Japanese-to-English speech translation system: ATR-MATRIX , 1998, ICSLP.

[4] Tony Ezzat,et al. Face analysis for the synthesis of photo-realistic talking heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[5] Hitoshi Kamada,et al. Temporal electron spin resonance imaging and its application for analysis of the half-life of a nitroxide radical in multiple brain areas of rats after epileptic seizures , 1999 .