Knowledge acquisition for a constrained speech system using WoZ

This paper describes the knowledge acquisition phase in a national project z aimed at the design of realistic spoken language dialogue system prototypes in the domain of airline ticket reservation and flight information [Dybkj~er and Dybkjaer, 1993]. The goals of the knowledge acquisition phase were to define a dialogue structure and a sublanguage vocabulary and grammar for subsequent implementation of a first prototype. The development method was the Wizard of Oz simulation technique [Fraser and Gilbert, 1991]. The dialogue model had to satisfy a number of conflicting constraints, most importantly: (1) A maximum user vocabulary of 500 word forms. (2) A maximum user utterance length of 10 words and an average length of 3-4 words. (3) A usable dialogue, including sufficient domain and task coverage, robustness and real-time system performance. (4) A natural form of dialogue and language. A usable system is one which can do the tasks required of it. In principle, it can replace a human operator on those tasks. A natural system, on the other hand, is one which allows users to use free and unconstrained spontaneous speech in efficiently achieving their goals. In the development of the first prototype to be described here, the focus was on usability (constraints (1)-(3) above) and on laying the foundations for meeting the naturalness constraint (4) in a second prototype. The real-time requirement of (3) forces the recogniser to handle at most 100 active words at a time, and together with (1) and (2) this obviously pushes the dialogue model towards a rigid system-directed dialogue structure. Seven iterations of Wizard of Oz experiments were performed involving taped and transcribed dialogues between the wizard and subjects. Voice distorting hardware (equalizer and harmonizer) was only used in the final set of experiments. A wizard's assistant was used in the three last sets of experiments. From iteration 3 onwards, the wizard used a graph structure based on the notion of basic tasks and containing canned phrases in the nodes and contents of possible user answers along the edges. In addition, users were instructed to answer questions briefly and one at a time in order to be understood by the system. Users were given broadly described scenarios