Real-Time Humanoid Avatar for Multimodal Human-Machine Interaction

A novel framework of multimodal human-machine or human-human interaction via real-time humanoid avatar communication is proposed for real-world mobile application. It integrates audio-visual analysis and synthesis modules to realize real-time head tracking, multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The 3D avatar provides customized modeling for low-bit rate virtual communication by adopting M3G standard and supports MPEG-4 FAPs. A robust user head tracker and the associated head pose and motion estimation scheme are developed for real-time avatar animation control at remote locations. The framework is recognized as an effective design for realistic industrial products of human-to-human mobile communication.

[1]  Jenq-Neng Hwang,et al.  Constrained optimization for audio-to-visual conversion , 2004, IEEE Transactions on Signal Processing.

[2]  Thomas S. Huang,et al.  3D Face Processing , 2004, The International Series in Video Computing.

[3]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[4]  Ronald A. Cole,et al.  Accurate visible speech synthesis based on concatenating variable length motion capture data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[5]  Thomas S. Huang,et al.  Real-time speech-driven face animation with expressions using neural networks , 2002, IEEE Trans. Neural Networks.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Igor S. Pandzic,et al.  MPEG-4 Facial Animation , 2002 .

[8]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  N. Zheng,et al.  M-Face: An Appearance-Based Photorealistic Model for Multiple Facial Attributes Rendering , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Thomas S. Huang,et al.  3D Face Processing: Modeling, Analysis and Synthesis , 2004 .

[11]  Quan Pan,et al.  Reliable and fast tracking of faces under varying pose , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[12]  Yun Fu,et al.  hMouse: Head Tracking Driven Virtual Computer Mouse , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[13]  Xiaoli Yang,et al.  Hierarchical animation control of avatars in 3-D virtual environments , 2005, IEEE Transactions on Instrumentation and Measurement.

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[16]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[17]  Stephan Rusdorf,et al.  Real-Time Interaction with a Humanoid Avatar in an Immersive Table Tennis Simulation , 2007, IEEE Transactions on Visualization and Computer Graphics.