H ow can technology support and invent new ways for geographically separate people to communicate and share experiences? The telephone is one of the most primeval methods, and is still widely used. As the Internet has evolved, however, novel communication methods have emerged. Among them are video tele-phony and videoconferencing technologies, including systems such as FreeWalk, 1 which supports small, casual group meetings in a virtual space with real-time video and audio of the participants. Far more popular, however, are real-time chat applications such as ICQ and MSN messenger, which let people exchange text-or voice-based messages. Among the reasons that chat applications are more popular than real-time video applications is that many people are reluctant to show their faces to their communication partners, particularly in real time. People also tend to communicate more freely when they can hide their identity in informal communications. However, as the use of emoticons suggests, communication without nonverbal information such as facial expressions can be monotonous. To address this, we have developed a system 2 that animates 3D facial agents based on real-time facial expression analysis techniques 3 and research on synthesizing facial expressions and text-to-speech capabilities. 4 (The " Related Work " sidebar discusses this research in more detail.) Our system combines visual, auditory, and primary interfaces to communicate one coherent multimodal chat experience. Users can represent themselves using agents they select from a group that we have predefined. When a user shows a particular expression while typing text, the 3D agent at the receiving end speaks the message aloud while it replays the recognized facial expression sequences and also augments the synthesized voice with appropriate emotional content. Because the visual data exchange is based on the MPEG-4 high-level Facial Animation Parameter for facial expressions (FAP 2), rather than real-time video, our method requires very low bandwidth. The " Web Extras " sidebar offers links to video files of our system at work. Our system Our system consists of three main modules: a real-time facial expression analysis component, which can calculate the MPEG-4 FAP 2; an affec-tive 3D agent with facial expression synthesis and text-to-speech capabilities; and a communication module. We have implemented a prototype to explore attractive Internet communication methods , and have also experimented with using these new modalities in chat communication. How it works As Figure 1 (on p. 22) shows, our system captures the user's current face image using a …
[1]
Mitsuru Ishizuka,et al.
MAKING THE WEB EMOTIONAL: AUTHORING MULTIMODAL PRESENTATIONS USING A SYNTHETIC 3D AGENT
,
2001
.
[2]
Jean-Luc Dugelay,et al.
Face Tracking and Realistic Animations for Telecommunicant Clones
,
2000,
IEEE Multim..
[3]
N. P. Chanrasiri.
Real Time Facial Expression Recognition System with Applications to Facial Animation in MPEG-4
,
2001
.
[4]
Michael M. Cohen,et al.
Modeling Coarticulation in Synthetic Visual Speech
,
1993
.
[5]
Mitsuru Ishizuka,et al.
A 3D Agent with Synthetic Face and Semiautonomous Behavior for Multimodal Presentations
,
2001
.
[6]
Larry S. Davis,et al.
Recognizing Human Facial Expressions From Long Image Sequences Using Optical Flow
,
1996,
IEEE Trans. Pattern Anal. Mach. Intell..
[7]
M. Yachida,et al.
Facial expression recognition and its degree estimation
,
1997,
Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[8]
Kiyoharu Aizawa,et al.
Analysis and synthesis of facial image sequences in model-based image coding
,
1994,
IEEE Trans. Circuits Syst. Video Technol..
[9]
Mark Steedman,et al.
Generating Facial Expressions for Speech
,
1996,
Cogn. Sci..
[10]
Takeshi Naemura,et al.
Communication over the Internet using a 3D agent with real-time facial expression analysis, synthesis and text to speech capabilities
,
2002,
The 8th International Conference on Communication Systems, 2002. ICCS 2002..
[11]
A. Murat Tekalp,et al.
Face and 2-D mesh animation in MPEG-4
,
2000,
Signal Process. Image Commun..
[12]
Hideyuki Nakanishi,et al.
FreeWalk: A 3D Virtual Space for Casual Meetings
,
1999,
IEEE Multim..
[13]
Mark Steedman,et al.
Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents
,
1994,
SIGGRAPH.