Augmented transition networks (ATNs) for dialog control: A longitudinal study

ABSTRACT Our research team has implemented over a dozen spoken natural language dialog systems in varied domains over the past decade. Each system uses the same underlying dialog controller – an augmented transition network (ATN) – for maintaining a cohesive, natural conversation with the user. In this paper, we will examine the evolution of our use of ATNs, present statistical analysis of the features of our ATNs, and discuss lessons learned. KEY WORDS Dialog, natural language processing, mixed initiative, augmented transition networks, virtual humans. 1. Introduction Since approximately 1996, our research team has worked on a series of PC-based applications in which the user interacts with responsive virtual characters. Applications have ranged from trauma patient assessment [1] to learning military tank maintenance diagnostic skills [2] to gaining skills in avoiding non-response during field interviews [3]. In these applications, the computer s im ul a te pro n’ b h v .Users interact with the virtual characters via voice, mouse, menu, and/or keyboard. We are certainly not alone in developing training, assessment, marketing, and oth erv iu alm np c s( ,.g[4,5,6,7,8,9,10,11]), but the breadth across domains and the consistency of the underlying architecture allows us to m e as ur o yt’ p f nc lg id .We have developed a dialog system architecture that e nabl su r tog icp d v w hvirtual humans and see and hear their realistic responses [12]. As seen in Figure 1, among the components that underlie the architecture are a Language Processor and a Behavior Engine. The Language Processor accepts spoken input and maps this input to an underlying semantic representation, and then functions in reverse, mapping semantic representations to gestural and speech output. Our applications variously use spoken natural language interaction [2], text-based interaction, and menu-based interaction. The Behavior Engine maps Language Processor output and other environmental stimuli to virtual human behaviors. The underlying data structure of the Behavior Engine is an augmented transition network (ATN) to be described in more detail in Section 3. These behaviors include decision-making and problem solving, performing actions in the virtual world, and spoken dialog. The Behavior Engine also controls the dynamic loading of contexts and knowledge for use by the Language Processor. The virtual characters are rendered via a Visualization Engine that performs gesture, movement, and speech actions, through morphing of vertices of a 3D model and playing of key-framed animation files (largely based on motion capture data). Physical interaction with the virtual character (e.g., using medical instruments) is realized via object-based and instrument-specific selection maps [13]. These interactions are controlled by both the Behavior Engine and Visualization Engine.

[1]  Laura Flicker,et al.  Usability and Acceptability Studies of Conversational Virtual Human Technology , 2004, SIGDIAL Workshop.

[2]  E. André,et al.  Exploiting Models of Personality and Emotions to Control the Behavior of Animated Interactive Agents , 2000 .

[3]  B. Hayes-Roth,et al.  Knowledge Systems Laboratory December 1997 Report No . KSL 97-10 Improvisational Synthetic Actors with Flexible Personalities , 1997 .

[4]  Thomas Rist,et al.  Employing AI Methods to Control the Behavior of Animated Interface Agents , 1999, Appl. Artif. Intell..

[5]  Robert Hubal,et al.  The virtual pediatric standardized patient application: formative evaluation findings. , 2005, Studies in health technology and informatics.

[6]  James A. Russell,et al.  How shall an emotion be called , 1997 .

[7]  Michael W. Link,et al.  Accessibility and acceptance of responsive virtual human technology as a survey interviewer training tool , 2006, Comput. Hum. Behav..

[8]  P N Kizakevich,et al.  The virtual standardized patient. Simulated patient-practitioner dialog for patient interview training. , 2000, Studies in health technology and informatics.

[9]  Richard A. Robb,et al.  Medicine meets virtual reality 2000 : envisioning healing : interactive technology and the patient-practitioner dialogue , 2000 .

[10]  P N Kizakevich,et al.  Virtual medical trainer. Patient assessment and trauma care simulator. , 1998, Studies in health technology and informatics.

[11]  Jonas Beskow,et al.  Developing a 3D-agent for the august dialogue system , 1999, AVSP.

[12]  Arne Jönsson,et al.  Wizard of Oz studies: why and how , 1993, IUI '93.

[13]  James C. Lester,et al.  The persona effect: affective impact of animated pedagogical agents , 1997, CHI.

[14]  Geoffrey A. Frank,et al.  Lessons learned in modeling schizophrenic and depressed responsive virtual humans for training , 2003, IUI '03.

[15]  W. Lewis Johnson,et al.  Animated Agents for Procedural Training in Virtual Reality: Perception, Cognition, and Motor Control , 1999, Appl. Artif. Intell..

[16]  Robert C. Hubal,et al.  Informed consent procedures: An experimental test using a virtual character in a dialog systems training application , 2006, J. Biomed. Informatics.

[17]  Curry Guinn,et al.  A Synthetic Character Application for Informed Consent , 2004, AAAI Technical Report.

[18]  Diana L. Eldreth,et al.  Psychometric properties of virtual reality vignette performance measures: a novel approach for assessing adolescents' social competency skills. , 2004, Health education research.

[19]  Curry Guinn,et al.  A Mixed-Initiative Intelligent Tutoring Agent for Interaction Training , .

[20]  Arthur C. Graesser,et al.  AutoTutor: A simulation of a human tutor , 1999, Cognitive Systems Research.

[21]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[22]  Curry I. Guinn,et al.  An evaluation of virtual human technology in informational kiosks , 2004, ICMI '04.

[23]  Robert Hubal,et al.  Extracting Emotional Information from the Text of Spoken Dialog , 2003 .