Real-Time Multimodal Human–Avatar Interaction

This paper presents a novel real-time multimodal human-avatar interaction (RTM-HAI) framework with vision-based remote animation control (RAC). The framework is designed for both mobile and desktop avatar-based human-machine or human-human visual communications in real-world scenarios. Using 3-D components stored in the Java mobile 3-D (M3G) file format, the avatar models can be flexibly constructed and customized on the fly on any mobile devices or systems that support the M3G standard. For the RAC head tracker, we propose a 2-D real-time face detection/tracking strategy through an interactive loop, in which the detection and tracking complement each other for efficient and reliable face localization, tolerating extreme user movement. With the face location robustly tracked, the RAC head tracker selects a main user and estimates the user's head rolling, tilting, yawing, scaling, horizontal, and vertical motion in order to generate avatar animation parameters. The animation parameters can be used either locally or remotely and can be transmitted through socket over the network. In addition, it integrates audio-visual analysis and synthesis modules to realize multichannel and runtime animations, visual TTS and real-time viseme detection and rendering. The framework is recognized as an effective design for future realistic industrial products of humanoid kiosk and human-to-human mobile communication.

[1]  Stephan Rusdorf,et al.  Real-Time Interaction with a Humanoid Avatar in an Immersive Table Tennis Simulation , 2007, IEEE Transactions on Visualization and Computer Graphics.

[2]  Joseph A. Paradiso,et al.  PingPongPlus: design of an athletic-tangible interface for computer-supported cooperative play , 1999, CHI '99.

[3]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[4]  Kikuo Fujimura,et al.  A robust elliptical head tracker , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[5]  Frederick I. Parke,et al.  Computer gernerated animation of faces , 1998 .

[6]  Thomas S. Huang,et al.  Real-time speech-driven face animation with expressions using neural networks , 2002, IEEE Trans. Neural Networks.

[7]  Ken Perlin,et al.  Improv: a system for scripting interactive actors in virtual worlds , 1996, SIGGRAPH.

[8]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[9]  Jenq-Neng Hwang,et al.  Constrained optimization for audio-to-visual conversion , 2004, IEEE Transactions on Signal Processing.

[10]  Xiaoli Yang,et al.  Hierarchical animation control of avatars in 3-D virtual environments , 2005, IEEE Transactions on Instrumentation and Measurement.

[11]  Wolfram Schiffmann,et al.  Head pose estimation of partially occluded faces , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[12]  M. Betke,et al.  The Camera Mouse: visual tracking of body features to provide computer access for people with severe disabilities , 2002, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[13]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Claudio S. Pinhanez,et al.  Interval scripts: a design paradigm for story-based interactive systems , 1997, CHI.

[16]  Ronald A. Cole,et al.  Accurate visible speech synthesis based on concatenating variable length motion capture data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[17]  Thomas S. Huang,et al.  3D Face Processing: Modeling, Analysis and Synthesis , 2004 .

[18]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[19]  N. Zheng,et al.  M-Face: An Appearance-Based Photorealistic Model for Multiple Facial Attributes Rendering , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Quan Pan,et al.  Reliable and fast tracking of faces under varying pose , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[21]  Yun Fu,et al.  hMouse: Head Tracking Driven Virtual Computer Mouse , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[22]  Volker Wulf,et al.  Computer Supported Collaborative Sports: Creating Social Spaces Filled with Sports Activities , 2004, ICEC.

[23]  Igor S. Pandzic,et al.  MPEG-4 Facial Animation , 2002 .

[24]  Marco La Cascia,et al.  Fast, reliable head tracking under varying illumination , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[25]  F. I. Parke June,et al.  Computer Generated Animation of Faces , 1972 .

[26]  Maurizio Mancini,et al.  A Virtual Head Driven by Music Expressivity , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Yun Fu,et al.  Real-Time Humanoid Avatar for Multimodal Human-Machine Interaction , 2007, 2007 IEEE International Conference on Multimedia and Expo.