Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction

This paper presents a vision of the near future in which computer interaction is characterized by natural face-to-face conversations with lifelike characters that speak, emote, and gesture. These animated agents will converse with people much like people converse effectively with assistants in a variety of focused applications. Despite the research advances required to realize this vision, and the lack of strong experimental evidence that animated agents improve human-computer interaction, we argue that initial prototypes of perceptive animated interfaces can be developed today, and that the resulting systems will provide more effective and engaging communication experiences than existing systems. In support of this hypothesis, we first describe initial experiments using an animated character to teach speech and language skills to children with hearing problems, and classroom subjects and social skills to children with autistic spectrum disorder. We then show how existing dialogue system architectures can be transformed into perceptive animated interfaces by integrating computer vision and animation capabilities. We conclude by describing the Colorado Literacy Tutor, a computer-based literacy program that provides an ideal testbed for research and development of perceptive animated interfaces, and consider next steps required to realize the vision.

[1]  Stacy Marsella,et al.  Tears and fears: modeling emotions and emotional behaviors in synthetic agents , 2001, AGENTS '01.

[2]  Sheri Hunnicutt,et al.  An experimental dialog system: WAXHOLM , 1993 .

[3]  J. Cassell,et al.  Towards a model of technology and literacy development: Story listening systems , 2004 .

[4]  Wayne H. Ward,et al.  Phone based voice activity detection using online Bayesian adaptation with conjugate normal distributions , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  Mübeccel Demirekler,et al.  On developing new text and audio corpora and speech recognition tools for the turkish language , 2002, INTERSPEECH.

[7]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Wayne H. Ward,et al.  The CU communicator: an architecture for dialogue systems , 2000, INTERSPEECH.

[10]  J. Cassell,et al.  SOCIAL DIALOGUE WITH EMBODIED CONVERSATIONAL AGENTS , 2005 .

[11]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[12]  Nikko Strom,et al.  The Waxholm system - a progress report , 2002 .

[13]  John H. L. Hansen,et al.  University of Colorado Dialogue Systems for Travel and Navigation , 2001, HLT.

[14]  Justine Cassell,et al.  Virtual peers as partners in storytelling and literacy learning , 2003, J. Comput. Assist. Learn..

[15]  D. Massaro Perceiving talking faces: from speech perception to a behavioral principle , 1999 .

[16]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[17]  Jacques de Villiers,et al.  New tools for interactive speech and language training: Using animated conversational agents in the classrooms of profoundly deaf children , 1999 .

[18]  Norman I. Badler,et al.  Creating Interactive Virtual Humans: Some Assembly Required , 2002, IEEE Intell. Syst..

[19]  Joseph P. Magliano,et al.  Collaborative dialogue patterns in naturalistic one-to-one tutoring , 1995 .

[20]  Christine Doran,et al.  Exploring Speech-Enabled Dialogue with the Galaxy Communicator Infrastructure , 2001, HLT.

[21]  Arthur C. Graesser,et al.  Intelligent Tutoring Systems with Conversational Dialogue , 2001, AI Mag..

[22]  F. Thomas,et al.  Disney Animation: The Illusion of Life , 1981 .

[23]  L. Barker,et al.  Computer-Assisted Vocabulary Acquisition: The CSLU Vocabulary Tutor in Oral-Deaf Education. , 2003, Journal of deaf studies and deaf education.

[24]  Wayne H. Ward,et al.  THE CU COMMUNICATOR SYSTEM 1 , 1999 .

[25]  Ronald A. Cole,et al.  CU animate tools for enabling conversations with animated characters , 2002, INTERSPEECH.

[26]  N. Badler,et al.  Toward Representing Agent Behaviors Modified by Personality and Emotion , 2002 .

[27]  Yukiko I. Nakano,et al.  MACK: Media lab Autonomous Conversational Kiosk , 2002 .

[28]  Ronald A. Cole,et al.  TOOLS FOR RESEARCH AND EDUCATION IN SPEECH SCIENCE , 1999 .

[29]  Jens Edlund,et al.  Constraint Manipulation and Visualization in a Multimodal Dialogue System , 2002 .

[30]  Antonio de Serpa-Leitao,et al.  The Waxholm systema progress report , 1995 .

[31]  Gerry Stahl,et al.  Developing Summarization Skills through the Use of LSA-Based Feedback , 2000, Interact. Learn. Environ..

[32]  Roland Reagan THE CU COMMUNICATOR SYSTEM , 1998 .

[33]  Aaron Bryan Loyall,et al.  Believable agents: building interactive personalities , 1997 .

[34]  Patrick Stone REVOLUTIONIZING LANGUAGE INSTRUCTION IN ORAL DEAF EDUCATION , 1999 .

[35]  Christopher G. Lewis The media equation: How people treat computers, televisions, and new media as real people and places , 1997 .

[36]  Jonas Beskow,et al.  Developing a 3D-agent for the august dialogue system , 1999, AVSP.

[37]  Kristen N. Moreno,et al.  AutoTutor Improves Deep Learning of Computer Literacy : Is it the Dialog or the Talking Head ? , 2004 .

[38]  B. Bloom The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring , 1984 .

[39]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[40]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[41]  Chen-Lin C. Kulik,et al.  Educational Outcomes of Tutoring: A Meta-analysis of Findings , 1982 .

[42]  Yonghong Yan,et al.  Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[43]  Stacy Marsella,et al.  Modeling the Interplay of Emotions and Plans in Multi-Agent Simulations , 2001 .

[44]  Susanne van Mulken,et al.  The impact of animated interface agents: a review of empirical research , 2000, Int. J. Hum. Comput. Stud..

[45]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  James C. Lester,et al.  Animated Pedagogical Agents: Face-to-Face Interaction in Interactive Learning Environments , 2000 .

[47]  Ronald A. Cole,et al.  Building 10,000 spoken dialogue systems , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[48]  Arthur C. Graesser,et al.  AutoTutor: A simulation of a human tutor , 1999, Cognitive Systems Research.

[49]  Sheri Hunnicutt,et al.  An experimental dialogue system: waxholm , 1993, EUROSPEECH.

[50]  Wayne H. Ward Extracting information in spontaneous speech , 1994, ICSLP.

[51]  Xiuyang Yu,et al.  Improvements in audio processing and language modeling in the CU communicator , 2001, INTERSPEECH.

[52]  Alice Tarachow,et al.  PARTICIPATORY DESIGN: CLASSROOM APPLICATIONS AND EXPERIENCES , 1999 .

[53]  D W Massaro,et al.  Speech perception in perceivers with hearing loss: synergy of multiple modalities. , 1999, Journal of speech, language, and hearing research : JSLHR.

[54]  Joakim Gustafson,et al.  Developing Multimodal Spoken Dialogue Systems : Empirical Studies of Spoken Human-Computer Interaction , 2002 .

[55]  Yonghong Yan,et al.  Accessible technology for interactive systems: a new approach to spoken language research , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[56]  Arthur C. Graesser,et al.  The Impact of Conversational Navigational Guides on the Learning, Use, and Perceptions of Users of a Web Site , 2003, AMKM.

[57]  Norman I. Badler,et al.  Representing and parameterizing agent behaviors , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[58]  Joakim Gustafson,et al.  Speech technology on trial: Experiences from the August system , 2000, Natural Language Engineering.

[59]  J. Cohn,et al.  Automated face analysis by feature point tracking has high concurrent validity with manual FACS coding. , 1999, Psychophysiology.

[60]  Kadri Hacioglu,et al.  Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[61]  David J. Steinhart,et al.  Summary Street: An Intelligent Tutoring System for Improving Student Writing through the use of Late , 2001 .

[62]  Alex Waibel,et al.  Intelligent animated agents for interactive language training , 1998, SIGC.