We are aiming to construct an expandable and adaptable dialog system which handles multiple tasks and senses users’ intention via multiple modalities. A flexible platform to integrate different dialog strategies and modalities is indispensa-ble for this purpose. In this paper, we propose an efficient approach to manage a dialog system using a weighted finite-state transducer (WFST) in which users’ concept and system’s action tags are input and output of the transducer, respectively. By incorporating WFSTs in dialog management, different components can easily be integrated and work on a common platform. We have constructed a prototype spoken dialog system of the Kyoto tour guide which assists users in making a plan for one-day trip to sightsee through interaction. A WFST for dialog management was created based on the annotated transcript of the Kyoto tour guide dialog corpus we recorded. The WFST was then composed with a word-to-concept WFST for language understanding, and optimized. We have confirmed our WFST-based dialog manager accepted recognition results from a speech recognizer well and worked as we designed.
[1]
Roberto Pieraccini,et al.
Learning dialogue strategies within the Markov decision process framework
,
1997,
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[2]
Gareth M. James,et al.
Challenges For Spoken Dialogue Systems
,
1999
.
[3]
Fernando Pereira,et al.
Weighted finite-state transducers in speech recognition
,
2002,
Comput. Speech Lang..
[4]
I. Lee Hetherington.
The MIT finite-state transducer toolkit for speech and language processing
,
2004,
INTERSPEECH.
[5]
Steve J. Young,et al.
Partially observable Markov decision processes for spoken dialog systems
,
2007,
Comput. Speech Lang..