论文信息 - ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs

ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs

ISIS (Intelligent Speech for Information Systems) is a trilingual spoken dialog system (SDS) for the stocks domain. It handles two dialects of Chinese (Cantonese and Putonghua) as well as English---the predominant languages in our region. The system supports spoken language queries regarding stock market information and simulated personal portfolios. The conversational interface is augmented with a screen display that can capture mouse-clicks as well as textual input by typing or stylus-writing. Real-time information is retrieved directly from a dedicated Reuters satellite feed. ISIS provides a system test-bed for our work in multilingual speech recognition and generation, speaker authentication, language understanding and dialog modeling. This article reports on our new explorations within the context of ISIS, including: (i) adaptivity to knowledge scope expansion; (ii) asynchronous human-computer interaction by task delegation to software agents; (iii) multi-threaded online interaction and offline delegation dialogs with interruptions for task switching.

[1] Michael S. Wogalter,et al. The Position of Static and on-off Banners in WWW Displays on Subsequent Recognition , 2000 .

[2] Helen M. Meng,et al. CU FOREX: a bilingual spoken dialog system for foreign exchange enquiries , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3] Helen Meng,et al. Comprehension across Application Domains and Languages , 2000 .

[4] Stephanie Seneff,et al. ORION: from on-line interaction to off-line delegation , 2000, INTERSPEECH.

[5] James Glass,et al. Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[6] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[7] Victor Zue,et al. A segment-based wordspotter using phonetic filler models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Mary Czerwinski,et al. Instant Messaging and Interruption: Influence of Task Type on Performance , 2000 .

[9] Marilyn A. Walker,et al. The AT&t-DARPA communicator mixed-initiative spoken dialog system , 2000, INTERSPEECH.

[10] Volker Steinbiss,et al. The Philips automatic train timetable information system , 1995, Speech Commun..

[11] Christine Doran,et al. Exploring Speech-Enabled Dialogue with the Galaxy Communicator Infrastructure , 2001, HLT.

[12] AustHarald,et al. The Philips automatic train timetable information system , 1995 .

[13] Joseph Polifroni,et al. Organization, communication, and control in the GALAXY-II conversational system , 1999, EUROSPEECH.

[14] Alexander I. Rudnicky,et al. Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[15] Ke Chen,et al. ISIS: a learning system with combined interaction and delegation dialogs , 2001, INTERSPEECH.

[16] Tan Lee,et al. Lexical tree decoding with a class-based language model for Chinese speech recognition , 2000, Interspeech.

[17] Franz Kummert,et al. Grapheme based speech recognition for large vocabularies , 2000, INTERSPEECH.

[18] Lori Lamel,et al. Design strategies for spoken language dialog systems , 1999, 6th European Conference on Speech Communication and Technology (Eurospeech 1999).

[19] Stephanie Seneff,et al. ANGIE: a new framework for speech analysis based on morpho-phonological modelling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20] Timothy J. Hazen,et al. A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21] Wai Lam,et al. To believe is to understand , 1999, EUROSPEECH.

[22] Stephanie Seneff,et al. Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems , 2003, NAACL.

[23] Mary Czerwinski,et al. Scope: providing awareness of multiple notifications at a glance , 2002, AVI '02.

[24] Douglas A. Reynolds,et al. A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[25] Paul P. Maglio,et al. Tradeoffs in displaying peripheral information , 2000, CHI.

[26] Salim Roukos,et al. Free-flow dialog management using forms , 1999, EUROSPEECH.

[27] Anand S. Rao,et al. BDI Agents: From Theory to Practice , 1995, ICMAS.

[28] Eric Horvitz,et al. Models of attention in computing and communication , 2003, Commun. ACM.

[29] Pak-Chung Ching,et al. CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects , 2002, INTERSPEECH.

[30] Pak-Chung Ching,et al. CU VOCAL Web Service: A Text-to-speech Synthesis Web Service for Voice-enabled Web-mediated Applications , 2003, WWW.

[31] Eric Horvitz,et al. Principles of mixed-initiative user interfaces , 1999, CHI '99.

[32] Sunil Issar. Estimation of language models for new spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[33] Candace L. Sidner,et al. Collaborating with Focused and Unfocused Users under Imperfect Communication , 2001, User Modeling.

[34] Hiroshi Ishii,et al. Tangible bits: towards seamless interfaces between people, bits and atoms , 1997, CHI.

[35] Adam Cheyer,et al. The Open Agent Architecture , 1997, Autonomous Agents and Multi-Agent Systems.

[36] Helen Meng,et al. Concatenating syllables for response generation in domain-specific applications , 2000 .

[37] Mary Czerwinski,et al. Notification, Disruption, and Memory: Effects of Messaging Interruptions on Memory and Performance , 2001, INTERACT.

[38] Helen M. Meng,et al. Concatenating syllables for response generation in spoken language applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39] Sharon L. Oviatt,et al. Perceptual user interfaces: multimodal interfaces that process what comes naturally , 2000, CACM.

[40] Eric Horvitz,et al. Display of Information for Time-Critical Decision Making , 1995, UAI.

[41] Walter Daelemans,et al. Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42] Pak-Chung Ching,et al. ISIS: a multi-modal, trilingual, distributed spoken dialog system developed with CORBA, java, XML and KQML , 2002, INTERSPEECH.

[43] Tim Finin,et al. KQML - A Language and Protocol for Knowledge and Information Exchange , 1994 .

[44] Paul Taylor,et al. The architecture of the Festival speech synthesis system , 1998, SSW.

[45] Philip R. Cohen,et al. MULTIMODAL INTERFACES THAT PROCESS WHAT COMES NATURALLY , 2000 .

[46] James R. Glass,et al. Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[47] Michael F. McTear,et al. Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[48] Ke Chen,et al. ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents , 2000, INTERSPEECH.

[49] Antonella De Angeli,et al. Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[50] Joseph Polifroni,et al. A form-based dialogue manager for spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[51] Daniel C. McFarlane,et al. Coordinating the Interruption of People in Human-Computer Interaction , 1999, INTERACT.

[52] Candace L. Sidner,et al. Discourse Structure and the Proper Treatment of Interruptions , 1985, IJCAI.