LIESHOU : A Mandarin Conversational Task Agent for the Galaxy-II Architecture

Multilinguality is an important component of spoken dialogue systems, both because it makes the systems available to a wider audience and because it leads to a more flexible system dialogue strategy. This thesis concerns the development of a Chinese language capability for the ORION system, which is one of many spoken dialogue systems available within the GALAXY-II architecture. This new system, which we call LIESHOU, interacts with Mandarin-speaking users and performs off-line tasks, initiating later contact with a user at a pre-negotiated time. The development and design of LIESHOU closely followed the design of similar multilingual GALAXY-II domains, such as MUXING (Chinese JUPITER), and PHRASEBOOK (Translation Guide for Foreign Travelers). The successful deployment of LIESHOU required the design and implementation of four main components speech recognition, natural language understanding, language generation, and speech synthesis. These four components were implemented using the SUMMIT speech recognition system, TINA Natural Language understanding system, GENESIS-II language generation system, and ENVOICE speech synthesis system respectively. The development of the necessary resources for each of these components is described in detail, and a system evaluation is given for the final implementation. Thesis Supervisor: Stephanie Seneff Title: Principal Research Scientist

[1]  James R. Glass,et al.  Natural-sounding speech synthesis using variable-length units , 1998, ICSLP.

[2]  James R. Glass,et al.  Information-theoretic criteria for unit selection synthesis , 2002, INTERSPEECH.

[3]  Joseph Polifroni,et al.  Formal and natural language generation in the Mercury conversational system , 2000, INTERSPEECH.

[4]  Joseph Polifroni,et al.  Galaxy-II as an Architecture for Spoken Dialogue Evaluation , 2000, LREC.

[5]  Chao Wang,et al.  Porting the galaxy system to Mandarin Chinese , 1997 .

[6]  Stephanie Seneff,et al.  ORION: from on-line interaction to off-line delegation , 2000, INTERSPEECH.

[7]  J. Makhoul,et al.  Automatic modeling for adding new words to a large-vocabulary continuous speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  I. Lee Hetherington,et al.  An efficient implementation of phonological rules using finite-state transducers , 2001, INTERSPEECH.

[9]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[10]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[11]  Stephanie Seneff,et al.  GENESIS-II: a versatile system for language generation in conversational system applications , 2000, INTERSPEECH.

[12]  Lauren M. Baptist Genesis-II : A language generation module for conversational systems , 2000 .

[13]  Stephanie Seneff,et al.  A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition , 1998, ICSLP.

[14]  James R. Glass,et al.  Multilingual language generation across multiple domains , 1994, ICSLP.

[15]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[16]  Joseph Polifroni,et al.  PROMOTING PORTABILITY IN DIALOGUE MANAGEMENT , 2002 .

[17]  Victor Zue,et al.  MUXING: a telephone-access Mandarin conversational system , 2000, INTERSPEECH.

[18]  Stephanie Seneff,et al.  A context resolution server for the galaxy conversational systems , 2003, INTERSPEECH.

[19]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[20]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[21]  John Nicholas Holmes,et al.  Speech synthesis , 1972 .

[22]  Wayne H. Ward,et al.  Modelling Non-verbal Sounds for Speech Recognition , 1989, HLT.

[23]  Hy Murveit,et al.  Spontaneous Speech Effects In Large Vocabulary Speech Recognition Applications , 1992, HLT.

[24]  James Glass,et al.  The SUMMIT speech recognition system: phonological modelling and lexical access , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[25]  Victor Zue,et al.  GALAXY: a human-language interface to on-line travel information , 1994, ICSLP.