Challenges in Building Highly-Interactive Dialog Systems

Spoken dialog researchers have recently demonstrated highly-interactive systems in several domains. This paper considers how to build on these advances to make systems more robust, easier to develop, and more scientifically significant. We identify key challenges whose solution would lead to improvements in dialog systems and beyond.

[1]  Incremental dialog processing in a task-oriented dialog , 2014, INTERSPEECH.

[2]  Sean Andrist,et al.  Conversational Gaze Aversion for Virtual Agents , 2013, IVA.

[3]  Nigel Ward,et al.  A Responsive Dialog System , 1999 .

[4]  Sebastian Möller,et al.  Predicting the quality and usability of spoken dialogue services , 2008, Speech Commun..

[5]  David Schlangen,et al.  Modelling Sub-Utterance Phenomena in Spoken Dialogue Systems , 2010 .

[6]  M. Pickering,et al.  An integrated theory of language production and comprehension. , 2013, The Behavioral and brain sciences.

[7]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[8]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[9]  Nigel Ward,et al.  Action-coordinating prosody , 2016 .

[10]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[11]  Nigel Ward Automatic discovery of simply-composable prosodic elements , 2014 .

[12]  He He,et al.  Temporal supervised learning for inferring a dialog policy from example conversations , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[13]  Zoran Popović,et al.  Motion fields for interactive character locomotion , 2010, SIGGRAPH 2010.

[14]  Oliver Lemon,et al.  Towards Action Selection Under Uncertainty for a Socially Aware Robot Bartender , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Diane J. Litman,et al.  Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor , 2011, Speech Commun..

[16]  Zhou Yu,et al.  Incremental Coordination: Attention-Centric Speech Production in a Physically Situated Conversational Agent , 2015, SIGDIAL Conference.

[17]  Jaime C. Acosta,et al.  Achieving rapport with turn-by-turn, user-responsive emotional coloring , 2011, Speech Commun..

[18]  Anton Nijholt,et al.  Mutually Coordinated Anticipatory Multimodal Interaction , 2008, COST 2102 Workshop.

[19]  Jennifer Balogh,et al.  Voice User Interface Design , 2004 .

[20]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yi Xu SPEECH PROSODY : A METHODOLOGICAL REVIEW , 2011 .

[22]  David Suendermann-Oeft,et al.  Are We There Yet? Research in Commercial Spoken Dialog Systems , 2009, TSD.

[23]  Harry Bunt,et al.  Multifunctionality in dialogue , 2011, Comput. Speech Lang..

[24]  Gabriel Skantze,et al.  Turn-taking, feedback and joint attention in situated human-robot interaction , 2014, Speech Commun..

[25]  Allen L. Gorin,et al.  Social correlates of turn-taking style , 2011, Comput. Speech Lang..

[26]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[27]  Antoine Raux,et al.  The Dialog State Tracking Challenge Series , 2014, AI Mag..

[28]  Herbert H. Clark,et al.  Speaking in time , 2002, Speech Commun..

[29]  U. Hasson,et al.  Speaker–listener neural coupling underlies successful communication , 2010, Proceedings of the National Academy of Sciences.

[30]  Bilge Mutlu,et al.  Learning-Based Modeling of Multimodal Behaviors for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[31]  Ning Wang,et al.  Can Virtual Humans Be More Engaging Than Real Ones? , 2007, HCI.

[32]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[33]  Phil Cohen,et al.  Dialogue modeling , 1997 .

[34]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[35]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[36]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[37]  Eric Horvitz,et al.  Multiparty Turn Taking in Situated Dialog: Study, Lessons, and Directions , 2011, SIGDIAL Conference.

[38]  T. Kawahara,et al.  23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue , 2014 .

[39]  Khalil Sima'an,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.

[40]  Gabriel Skantze,et al.  Data-driven models for timing feedback responses in a Map Task dialogue system , 2014, Comput. Speech Lang..

[41]  Timo Baumann,et al.  Incremental spoken dialogue processing: architecture and lower-level components , 2013 .

[42]  Maxine Eskénazi,et al.  Optimizing the turn-taking behavior of task-oriented spoken dialog systems , 2012, TSLP.

[43]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[44]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[45]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[46]  Tatsuya Kawahara,et al.  Toward Adaptive Generation of Backchannels for Attentive Listening Agents , 2014 .

[47]  Luke D. Postema,et al.  The Institute of Electrical and Electronics Engineers , 1963, Nature.

[48]  Björn W. Schuller,et al.  Detecting overlapping speech with long short-term memory recurrent neural networks , 2013, INTERSPEECH.

[49]  Michael J. Richardson,et al.  Social Connection Through Joint Action and Interpersonal Coordination , 2009, Top. Cogn. Sci..

[50]  Louis-Philippe Morency,et al.  Parasocial consensus sampling: combining multiple perspectives to learn virtual human behavior , 2010, AAMAS.

[51]  John R. Hershey,et al.  Cost-level integration of statistical and rule-based dialog managers , 2014, INTERSPEECH.

[52]  H. Bekkering,et al.  Joint action: bodies and minds moving together , 2006, Trends in Cognitive Sciences.

[53]  David DeVault,et al.  “So, which one is it?” The effect of alternative incremental architectures in a high-performance game-playing agent , 2015, SIGDIAL Conference.

[54]  Dongho Kim,et al.  Inverse reinforcement learning for micro-turn management , 2014, INTERSPEECH.

[55]  J. Knote Bowling alone: The collapse and revival of American community , 2004 .

[56]  Sebastian Möller,et al.  A Framework for Model-based Evaluation of Spoken Dialog Systems , 2008, SIGDIAL Workshop.

[57]  Christian Wolf,et al.  Learning joint multimodal behaviors for face-to-face interaction: performance & properties of statistical models , 2015, HRI 2015.

[58]  Tetsunori Kobayashi,et al.  Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system , 2005, INTERSPEECH.

[59]  David Vandyke,et al.  Policy committee for adaptation in multi-domain spoken dialogue systems , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[60]  Antoine Raux,et al.  The Dynamics of Action Corrections in Situated Interaction , 2010, SIGDIAL Conference.

[61]  Pierre Lison,et al.  Spoken dialogue systems: the new frontier in human-computer interaction , 2014, XRDS.

[62]  Daniel Jurafsky,et al.  Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates , 2013, Comput. Speech Lang..

[63]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[64]  Björn W. Schuller,et al.  Building Autonomous Sensitive Artificial Listeners , 2012, IEEE Transactions on Affective Computing.

[65]  Herbert H. Clark,et al.  Navigating joint projects with dialogue , 2003 .

[66]  Crystal Chao,et al.  Timing in multimodal turn-taking interactions , 2012, J. Hum. Robot Interact..

[67]  Lena Vogler Survey Of The State Of The Art In Human Language Technology , 2016 .

[68]  Eric Horvitz,et al.  Towards Situated Collaboration , 2012, SDCTD@NAACL-HLT.

[69]  V. Ferreira,et al.  The Oxford Handbook of Language Production , 2014 .

[70]  Catherine J. Stoodley,et al.  Consensus Paper: Language and the Cerebellum: an Ongoing Enigma , 2013, The Cerebellum.

[71]  Kallirroi Georgila,et al.  SimSensei kiosk: a virtual human interviewer for healthcare decision support , 2014, AAMAS.

[72]  Dan Goldwasser,et al.  “I Object!” Modeling Latent Pragmatic Effects in Courtroom Dialogues , 2014, EACL.

[73]  James F. Allen,et al.  Toward Conversational Human-Computer Interaction , 2001, AI Mag..

[74]  David DeVault,et al.  Can I Finish? Learning When to Respond to Incremental Interpretation Results in Interactive Dialogue , 2009, SIGDIAL Conference.