Speech Dialogue With Facial Displays: Multimodal Human-Computer Conversation

Human face-to-face conversation is an ideal model for human-computer dialogue. One of the major features of face-to-face communication is its multiplicity of communication channels that act on multiple modalities. To realize a natural multimodal dialogue, it is necessary to study how humans perceive information and determine the information to which humans are sensitive. A face is an independent communication channel that conveys emotional and conversational signals, encoded as facial expressions. We have developed an experimental system that integrates speech dialogue and facial animation, to investigate the effect of introducing communicative facial expressions as a new modality in human-computer conversation. Our experiments have showen that facial expressions are helpful, especially upon first contact with the system. We have also discovered that featuring facial expressions at an early stage improves subsequent interaction.

[1]  Akikazu Takeuchi A Rapid Face Construction Lab , 1992 .

[2]  Kôiti Hasida,et al.  Joint Utterance: Intrasentential Speaker/Hearer Switch as an Emergent Phenomenon , 1993, IJCAI.

[3]  I. Katunobu,et al.  Continuous speech recognition by context-dependent phonetic HMM and an efficient algorithm for finding N-Best sentence hypotheses , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Jeannette G. Neal,et al.  Multi-Modal References in Human-Computer Dialogue , 1988, AAAI.

[5]  Hozumi Tanaka,et al.  Continuous speech recognition by context-dependent phonetic HMM and an efficient algorithm for finding N-Best sentence hypotheses , 1992, ICASSP.

[6]  Akikazu Takeuchi,et al.  Social Interaction: Multimodal Conversation with Social Agents , 1994, AAAI.

[7]  Ben Shneiderman,et al.  Direct Manipulation: A Step Beyond Programming Languages , 1983, Computer.

[8]  P. Ekman Unmasking The Face , 1975 .

[9]  Akikazu Takeuchi,et al.  Communicative facial displays as a new conversational modality , 1993, INTERCHI.

[10]  Katashi Nagao A Preferential Constraint Satisfaction Technique for Natural Language Analysis , 1992, ECAI.

[11]  P. Ekman,et al.  The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .

[12]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[13]  Nicole Chovil Discourse‐oriented facial displays in conversation , 1991 .

[14]  Kôiti Hasida,et al.  Understanding Spoken Natural Language with Omni-Directional Information Flow , 1993, IJCAI.

[15]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[16]  Katashi Nagao Abduction and Dynamic Preference in Plan-Based Dialogue Understanding , 1993, IJCAI.

[17]  Oliviero Stock,et al.  Natural Language and Exploration of an Information Space: The ALFresco Interactive System , 1991, IJCAI.

[18]  L. Suchman Plans and situated actions , 1987 .

[19]  Brenda Laurel,et al.  Guides 3.0 , 1991, CHI.