A realistic wizard of oz simulation of a multimodal spoken language system

This paper describes a Wizard of Oz (WOZ) system that allows the realistic simulation of a multimodal spoken language system. A Wizard protocol has been drawn up which means that the WOZ system will simulate the limitations of an automatic system rather than allow the user to engage in the full range of human-human dialogue. In support of this protocol is a sophisticated Wizard response panel and underlying response generation functionality. This enables the Wizard to respond to complex multimodal inputs in near real-time. The chosen application is a 3D retail service, in which users can select furnishings from a database according to colour, pattern, fabric type, etc., transfer furnishings to objects in a virtual showroom, ask about prices and matching of fabrics, etc. The system includes a “virtual assistant”, i.e. a synthetic persona which speaks the verbal system output. Users make their input by a combination of fluent speech and touchscreen input. The paper describes a formal trial carried out with the WOZ system, and discusses the results.