Human-computer interaction and animation system for simple interfacing to virtual environments

Presents MARTI (Man-machine Animation Real-Time Interface) for simple interfacing to virtual environments. The system is designed for automated special-effect animation and human-computer interaction, and can immerse people, irrespective of their expertise, into new technologies. Previous research has used image recognition systems to extract control signals for lip synchronisation; however, the analysis requires that the key articulatory points be highlighted. Researchers have also considered acoustic soundtracks. However, these systems are limited by the accuracy of their speech recognition and must be trained on the single animator's voice. Furthermore, they do not provide timing information, which prevents accurate synchronisation. MARTI overcomes these limitations, allowing automatic lip synchronisation from a single speech input, and without the normal constraints of head-sets, reflectors and complex puppeteer control hardware. The system achieves a lip synchronisation performance in excess of 81% and does not require pre-training to the performer's voice, but instead operates with continuous speech in 'normal' non-laboratory conditions. Furthermore, the system returns timing information, and is invariant to regional accents and dialects, race, age and gender. MARTI introduces novel research from a number of engineering fields in order to realise the first natural interface and animation system to be capable of high performance for real users and real-world applications.

[1]  Hiroshi Harashima,et al.  Three‐dimensional (3‐D) facial model‐based description and synthesis of facial expressions , 1991 .

[2]  Satnam Dlay,et al.  MARTI: man-machine animation real-time interface , 1997, Electronic Imaging.

[3]  Satnam Singh Dlay,et al.  MARTI: Man-Machine Animation Real-Time Interface: The Illusion of Life , 1997, HCI.

[4]  Satnam Dlay,et al.  Berger check prediction for concurrent error detection in the Braun array multiplier , 1996 .

[5]  Shigeo Morishima,et al.  Image synthesis and editing system for a multi-media human interface with speaking head , 1992 .

[6]  Satnam Singh Dlay,et al.  Automated lip synchronisation for human-computer interaction and special effect animation , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[7]  Hervé Bourlard,et al.  Connectionist speech recognition , 1993 .

[8]  Hiroshi Harashima,et al.  A facial motion synthesis for intelligent man-machine interface , 1991, Systems and Computers in Japan.

[9]  Hynek Hermansky,et al.  Perceptually based linear predictive analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[11]  John Lewis,et al.  Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[12]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.