ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs

ISIS (Intelligent Speech for Information Systems) is a trilingual spoken dialog system (SDS) for the stocks domain. It handles two dialects of Chinese (Cantonese and Putonghua) as well as English---the predominant languages in our region. The system supports spoken language queries regarding stock market information and simulated personal portfolios. The conversational interface is augmented with a screen display that can capture mouse-clicks as well as textual input by typing or stylus-writing. Real-time information is retrieved directly from a dedicated Reuters satellite feed. ISIS provides a system test-bed for our work in multilingual speech recognition and generation, speaker authentication, language understanding and dialog modeling. This article reports on our new explorations within the context of ISIS, including: (i) adaptivity to knowledge scope expansion; (ii) asynchronous human-computer interaction by task delegation to software agents; (iii) multi-threaded online interaction and offline delegation dialogs with interruptions for task switching.

[1]  Michael S. Wogalter,et al.  The Position of Static and on-off Banners in WWW Displays on Subsequent Recognition , 2000 .

[2]  Helen M. Meng,et al.  CU FOREX: a bilingual spoken dialog system for foreign exchange enquiries , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Helen Meng,et al.  Comprehension across Application Domains and Languages , 2000 .

[4]  Stephanie Seneff,et al.  ORION: from on-line interaction to off-line delegation , 2000, INTERSPEECH.

[5]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[6]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[7]  Victor Zue,et al.  A segment-based wordspotter using phonetic filler models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Mary Czerwinski,et al.  Instant Messaging and Interruption: Influence of Task Type on Performance , 2000 .

[9]  Marilyn A. Walker,et al.  The AT&t-DARPA communicator mixed-initiative spoken dialog system , 2000, INTERSPEECH.

[10]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[11]  Christine Doran,et al.  Exploring Speech-Enabled Dialogue with the Galaxy Communicator Infrastructure , 2001, HLT.

[12]  AustHarald,et al.  The Philips automatic train timetable information system , 1995 .

[13]  Joseph Polifroni,et al.  Organization, communication, and control in the GALAXY-II conversational system , 1999, EUROSPEECH.

[14]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[15]  Ke Chen,et al.  ISIS: a learning system with combined interaction and delegation dialogs , 2001, INTERSPEECH.

[16]  Tan Lee,et al.  Lexical tree decoding with a class-based language model for Chinese speech recognition , 2000, Interspeech.

[17]  Franz Kummert,et al.  Grapheme based speech recognition for large vocabularies , 2000, INTERSPEECH.

[18]  Lori Lamel,et al.  Design strategies for spoken language dialog systems , 1999, 6th European Conference on Speech Communication and Technology (Eurospeech 1999).

[19]  Stephanie Seneff,et al.  ANGIE: a new framework for speech analysis based on morpho-phonological modelling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20]  Timothy J. Hazen,et al.  A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[21]  Wai Lam,et al.  To believe is to understand , 1999, EUROSPEECH.

[22]  Stephanie Seneff,et al.  Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems , 2003, NAACL.

[23]  Mary Czerwinski,et al.  Scope: providing awareness of multiple notifications at a glance , 2002, AVI '02.

[24]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[25]  Paul P. Maglio,et al.  Tradeoffs in displaying peripheral information , 2000, CHI.

[26]  Salim Roukos,et al.  Free-flow dialog management using forms , 1999, EUROSPEECH.

[27]  Anand S. Rao,et al.  BDI Agents: From Theory to Practice , 1995, ICMAS.

[28]  Eric Horvitz,et al.  Models of attention in computing and communication , 2003, Commun. ACM.

[29]  Pak-Chung Ching,et al.  CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects , 2002, INTERSPEECH.

[30]  Pak-Chung Ching,et al.  CU VOCAL Web Service: A Text-to-speech Synthesis Web Service for Voice-enabled Web-mediated Applications , 2003, WWW.

[31]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[32]  Sunil Issar Estimation of language models for new spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[33]  Candace L. Sidner,et al.  Collaborating with Focused and Unfocused Users under Imperfect Communication , 2001, User Modeling.

[34]  Hiroshi Ishii,et al.  Tangible bits: towards seamless interfaces between people, bits and atoms , 1997, CHI.

[35]  Adam Cheyer,et al.  The Open Agent Architecture , 1997, Autonomous Agents and Multi-Agent Systems.

[36]  Helen Meng,et al.  Concatenating syllables for response generation in domain-specific applications , 2000 .

[37]  Mary Czerwinski,et al.  Notification, Disruption, and Memory: Effects of Messaging Interruptions on Memory and Performance , 2001, INTERACT.

[38]  Helen M. Meng,et al.  Concatenating syllables for response generation in spoken language applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  Sharon L. Oviatt,et al.  Perceptual user interfaces: multimodal interfaces that process what comes naturally , 2000, CACM.

[40]  Eric Horvitz,et al.  Display of Information for Time-Critical Decision Making , 1995, UAI.

[41]  Walter Daelemans,et al.  Transcription of out-of-vocabulary words in large vocabulary speech recognition based on phoneme-to-grapheme conversion , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  Pak-Chung Ching,et al.  ISIS: a multi-modal, trilingual, distributed spoken dialog system developed with CORBA, java, XML and KQML , 2002, INTERSPEECH.

[43]  Tim Finin,et al.  KQML - A Language and Protocol for Knowledge and Information Exchange , 1994 .

[44]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[45]  Philip R. Cohen,et al.  MULTIMODAL INTERFACES THAT PROCESS WHAT COMES NATURALLY , 2000 .

[46]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[47]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[48]  Ke Chen,et al.  ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents , 2000, INTERSPEECH.

[49]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[50]  Joseph Polifroni,et al.  A form-based dialogue manager for spoken language applications , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[51]  Daniel C. McFarlane,et al.  Coordinating the Interruption of People in Human-Computer Interaction , 1999, INTERACT.

[52]  Candace L. Sidner,et al.  Discourse Structure and the Proper Treatment of Interruptions , 1985, IJCAI.