Speech , Image , and Language Processing for Human Computer Interaction : Multi-Modal Advancements

Multimodal systems have attained increased attention in recent years, which has made possible important improvements in the technologies for recognition, processing, and generation of multimodal information. However, there are still many issues related to multimodality which are not clear, for example, the principles that make it possible to resemble human-human multimodal communication. This chapter focuses on some of the most important challenges that researchers have recently envisioned for future multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable and affective multimodal interfaces. Nieves Ábalos University of Granada, CITIC-UGR, Spain DOI: 10.4018/978-1-4666-0954-9.ch013

[1]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[2]  Stephanie Seneff TINA. A probabilistic syntactic parser for speech understanding systems , 1989 .

[3]  Anders Baekgaard,et al.  Experience with a dialogue description formalism for realistic applications , 1992, ICSLP.

[4]  Joëlle Coutaz,et al.  Applying the Wizard of Oz Technique to the Study of Multimodal Systems , 1993, EWHCI.

[5]  Fred Runge,et al.  Dialogue design principles - key for usability of voice processing , 1993, EUROSPEECH.

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[8]  Ehud Reiter,et al.  NLG vs. Templates , 1995, ArXiv.

[9]  Steve Young,et al.  The HTK book , 1995 .

[10]  Victor Zue,et al.  Multilingual spoken-language understanding in the MIT Voyager system , 1995, Speech Commun..

[11]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[12]  Anton Nijholt,et al.  Building dialogue systems that sell , 1996 .

[13]  Michael Elhadad,et al.  An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component , 1996, INLG.

[14]  Sangkyu Park,et al.  Multimodal user interfaces in the Open Agent Architecture , 1997, IUI '97.

[15]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[16]  Wolfgang Minker,et al.  Stochastic versus rule-based speech understanding for information retrieval , 1998, Speech Commun..

[17]  A. Stolcke,et al.  Dialog act modelling for conversational speech , 1998 .

[18]  Michael F. McTear,et al.  Modelling spoken dialogues with state transition diagrams: experiences with the CSLU toolkit , 1998, ICSLP.

[19]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[20]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[21]  Xin Zhang,et al.  LODESTAR: a Mandarin spoken dialogue system for travel information retrieval , 1999, EUROSPEECH.

[22]  Gregory D. Abowd,et al.  Towards a Better Understanding of Context and Context-Awareness , 1999, HUC.

[23]  Amanda Stent,et al.  The CommandTalk Spoken Dialogue System , 1999, ACL.

[24]  Alexander I. Rudnicky,et al.  Task-based dialog management using an agenda , 2000 .

[25]  J. Cassell,et al.  Embodied conversational agents , 2000 .

[26]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[27]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[28]  R. Malaka,et al.  CRUMPET: creation of user-friendly mobile services personalised for tourism , 2001 .

[29]  George Coulouris,et al.  Middleware Support for Context-Aware Multimedia Applications , 2001, DAIS.

[30]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[31]  Paul Lamere,et al.  FreeTTS: a performance case study , 2002 .

[32]  Jadwiga Indulska,et al.  Modeling Context Information in Pervasive Computing Systems , 2002, Pervasive.

[33]  Wolfgang Wahlster,et al.  Towards Symmetric Multimodality: Fusion and Fission of Speech, Gesture, and Facial Expression , 2003, KI.

[34]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[35]  L. Radford Gestures, Speech, and the Sprouting of Signs: A Semiotic-Cultural Approach to Students' Types of Generalization , 2003 .

[36]  Mitsuru Ishizuka,et al.  Persona Effect Revisited --- Using Bio-signals to Measure and Reflect the Impact of Character-bas , 2003 .

[37]  Ronald A. Cole,et al.  Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction , 2003, Proc. IEEE.

[38]  Johan Bos,et al.  Meaningful Conversation with a Mobile Robot , 2003, EACL.

[39]  Yorick Wilks,et al.  Multimodal Dialogue Management in the COMIC Project , 2003 .

[40]  Maxine Eskénazi,et al.  LET's GO: improving spoken dialog systems for the elderly and non-natives , 2003, INTERSPEECH.

[41]  Paul Walsh,et al.  Speech enabled e-learning for adult literacy tutoring , 2003, Proceedings 3rd IEEE International Conference on Advanced Technologies.

[42]  Chin-Hui Lee,et al.  An automatic dialogue generation platform for personalized dialogue applications , 2004, Speech Commun..

[43]  Michael F. McTear,et al.  Spoken Dialogue Technology , 2004, Springer London.

[44]  Eric Horvitz,et al.  Optimizing Automated Call Routing by Integrating Spoken Dialog Models with Queuing Models , 2004, NAACL.

[45]  Wolfgang Minker,et al.  The SENECA spoken language dialogue system , 2004, Speech Commun..

[46]  Katsumi Nihei,et al.  Context sharing platform , 2004 .

[47]  Alfons Kemper,et al.  A Framework for Context-Aware Adaptable Web Services , 2004, EDBT.

[48]  Marc Schröder,et al.  Piecing Together the Emotion Jigsaw , 2004, MLMI.

[49]  Chafic Mokbel,et al.  Towards multilingual speech recognition using data driven source/target acoustical units association , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Keiichi Tokuda,et al.  Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents , 2004, Life-like characters.

[51]  Nicu Sebe,et al.  Towards authentic emotion recognition , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[52]  Timothy Bickmore,et al.  Some Novel Aspects of Health Communication from a Dialogue Systems Perspective , 2004, AAAI Technical Report.

[53]  A.W. Black,et al.  Using speech in noise to improve understandability for elderly listeners , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[54]  Rosalind W. Picard,et al.  Evaluating affective interactions: Alternatives to asking what users feel , 2005 .

[55]  He Xiao,et al.  An Adaptive Personality Model for ECAs , 2005, ACII.

[56]  Koray Balci,et al.  XfaceEd: authoring tool for embodied conversational agents , 2005, ICMI '05.

[57]  Eli Hagen,et al.  Adaptation of an automotive dialogue system to users’ expertise and evaluation of the system , 2006, SIGDIAL.

[58]  Niels Ole Bernsen,et al.  Animating an interactive conversational character for an educational game system , 2005, IUI.

[59]  Kallirroi Georgila,et al.  Learning user simulations for information state update dialogue systems , 2005, INTERSPEECH.

[60]  F. Burkhardt,et al.  An Emotion-Aware Voice Portal , 2005 .

[61]  Hugo D. Critchley,et al.  Activity in the human brain predicting differential heart rate responses to emotional facial expressions , 2005, NeuroImage.

[62]  Masahiro Araki,et al.  Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment , 2005 .

[63]  Ana Paiva,et al.  Achieving Empathic Engagement Through Affective Interaction with Synthetic Characters , 2005, ACII.

[64]  S. Baldassarri,et al.  MaxinePPT : Using 3 D Virtual Characters for Natural Interaction , 2006 .

[65]  Wolfgang Wahlster,et al.  SmartKom: Foundations of Multimodal Dialogue Systems , 2006, SmartKom.

[66]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[67]  Diane J. Litman,et al.  Modelling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters , 2006, NAACL.

[68]  Joel R. Tetreault,et al.  Using system and user performance features to improve emotion detection in spoken tutoring dialogs , 2006, INTERSPEECH.

[69]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[70]  Oliver Lemon,et al.  REINFORCEMENT LEARNING OF DIALOGUE STRATEGIES WITH HIERARCHICAL ABSTRACT MACHINES , 2006, 2006 IEEE Spoken Language Technology Workshop.

[71]  John Fox,et al.  Automatic generation of spoken dialogue from medical plans and ontologies , 2006, J. Biomed. Informatics.

[72]  Johannes Pittermann,et al.  Integrating emotion recognition into an adaptive spoken language dialogue system , 2006 .

[73]  Oxford Ox,et al.  Artificial Companions as a new kind of interface to the future Internet , 2006 .

[74]  Dave Burke Voice Extensible Markup Language (VoiceXML) , 2007 .

[75]  Maxine Eskénazi,et al.  A multi-layer architecture for semi-synchronous event-driven dialogue management , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[76]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[77]  Nikos Fakotakis,et al.  MeteoBayes: Effective Plan Recognition in a Weather Dialogue System , 2007, IEEE Intelligent Systems.

[78]  Paul Dourish,et al.  How emotion is made and measured , 2007, Int. J. Hum. Comput. Stud..

[79]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[80]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[81]  Ramón López-Cózar,et al.  Influence of contextual information in emotion annotation for spoken dialogue systems , 2008, Speech Commun..

[82]  Ian McGraw,et al.  The WAMI toolkit for developing, deploying, and evaluating web-accessible multimodal interfaces , 2008, ICMI '08.

[83]  Andrea Corradini,et al.  A Generic Spoken Dialogue Manager Applied to an Interactive 2D Game , 2008, PIT.

[84]  Dimitrios Tzovaras Multimodal user interfaces : from signals to interaction , 2008 .

[85]  David Griol,et al.  A statistical approach to spoken dialog systems design and evaluation , 2008, Speech Commun..

[86]  Ramón López-Cózar,et al.  Two-Level Fusion to Improve Emotion Classification in Spoken Dialogue Systems , 2008, TSD.

[87]  Petros Maragos,et al.  Multimodal Processing and Interaction, Audio, Video, Text , 2010 .

[88]  Schahram Dustdar,et al.  inContext: A Pervasive and Collaborative Working Environment for Emerging Team Forms , 2008, 2008 International Symposium on Applications and the Internet.

[89]  Kun He,et al.  A novel approach of emotion recognition based on selective ensemble , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[90]  Jason Baldridge,et al.  Multidisciplinary Instruction with the Natural Language Toolkit , 2008 .

[91]  Ramón López-Cózar,et al.  Relations between de-facto criteria in the evaluation of a spoken dialogue system , 2008, Speech Commun..

[92]  Mattias Heldner,et al.  Towards human-like spoken dialogue systems , 2008, Speech Commun..

[93]  Patrizia Grifoni,et al.  Multimodal Human Computer Interaction and Pervasive Services , 2009 .

[94]  José Rouillard Multimodality in Mobile Computing and Mobile Devices: Methods for Adaptable Usability , 2009 .

[95]  Kristiina Jokinen,et al.  Constructive Dialogue Modelling - Speech Interaction and Rational Agents , 2009, Wiley series in agent technology.

[96]  Schahram Dustdar,et al.  A survey on context-aware web service systems , 2009, Int. J. Web Inf. Syst..

[97]  María L. Flecha-García,et al.  Eyebrow raises in dialogue and their relation to discourse structure, utterance function and pitch accents in English , 2010, Speech Commun..

[98]  Denis Lalanne,et al.  Description languages for multimodal interaction: a set of guidelines and its illustration with SMUIML , 2010, Journal on Multimodal User Interfaces.

[99]  David Griol,et al.  A Conversational Academic Assistant for the Interaction in Virtual Worlds , 2010, DCAI.

[100]  Ramón López-Cózar,et al.  A Multimodal Dialogue System for an Ambient Intelligent Application in Home Environments , 2010, TSD.

[101]  Wolfgang Minker,et al.  Proactive Spoken Dialogue Interaction in Multi-Party Environments , 2010 .

[102]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[103]  Wolfgang Minker,et al.  Adaptive Multimodal Interactive Systems , 2011 .

[104]  Ramón López-Cózar,et al.  A Toolkit for the Evaluation of Spoken Dialogue Systems in Ambient Intelligence Domains , 2011, Australasian Conference on Interactive Entertainment.

[105]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[106]  Diane J. Litman,et al.  Designing and evaluating a wizarded uncertainty-adaptive spoken dialogue tutoring system , 2011, Comput. Speech Lang..

[107]  R. Gunderman,et al.  Emotional intelligence. , 2011, Journal of the American College of Radiology : JACR.

[108]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[109]  James R. Glass,et al.  Exploiting Context Information in Spoken Dialogue Interaction with Mobile Devices ? , 2022 .