Efficient parsing of spoken inputs for human-robot interaction

The use of deep parsers in spoken dialogue systems is usually subject to strong performance requirements. This is particularly the case in human-robot interaction, where the computing resources are limited and must be shared by many components in parallel. A real-time dialogue system must be capable of responding quickly to any given utterance, even in the presence of noisy, ambiguous or distorted input. The parser must therefore ensure that the number of analyses remains bounded at every processing step. The paper presents a practical approach to addressing this issue in the context of deep parsers designed for spoken dialogue. The approach is based on a word lattice parser combined with a statistical model for parse selection. Each word lattice is parsed incrementally, word by word, and a discriminative model is applied at each incremental step to prune the set of resulting partial analyses. The model incorporates a wide range of linguistic and contextual features and can be trained with a simple perceptron. The approach is fully implemented as part of a spoken dialogue system for human-robot interaction. Evaluation results on a Wizard-of-Oz test suite demonstrate significant improvements in parsing time.

[1]  Pierre Lison,et al.  Robust Processing of Situated Spoken Dialogue , 2009, KI.

[2]  Jonathan Ginzburg,et al.  Non-Sentential Utterances in Dialogue: A: Corpus-Based Study , 2002, SIGDIAL Workshop.

[3]  Nick Hawes,et al.  Crossmodal content binding in information-processing architectures , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Deb Roy,et al.  Towards situated speech understanding: visual context priming of language models , 2005, Comput. Speech Lang..

[5]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[6]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[7]  C. Clifton,et al.  The On-line Study of Sentence Comprehension: Eyetracking, ERPs and Beyond , 2004 .

[8]  Michael Collins,et al.  Parameter Estimation for Statistical Parsing Models: Theory and Practice of , 2001, IWPT.

[9]  Matthew W. Crocker,et al.  The Coordinated Interplay of Scene, Utterance, and World Knowledge: Evidence From Eye Tracking , 2006, Cogn. Sci..

[10]  Pierre Lison,et al.  Salience-driven Contextual Priming of Speech Recognition for Human-Robot Interaction , 2008, ECAI.

[11]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[12]  Aaron Sloman,et al.  Towards an Integrated Robot with Multiple Cognitive Functions , 2007, AAAI.

[13]  Matthias Scheutz,et al.  Incremental natural language processing for HRI , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  Mark Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[15]  James R. Curran,et al.  Log-Linear Models for Wide-Coverage CCG Parsing , 2003, EMNLP.

[16]  Pierre Lison,et al.  An Integrated Approach to Robust Processing of Situated Spoken Dialogue , 2009 .

[17]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[18]  Steve J. Young,et al.  Bootstrapping language models for dialogue systems , 2006, INTERSPEECH.

[19]  Jason Baldridge,et al.  Coupling CCG and Hybrid Logic Dependency Semantics , 2002, ACL.

[20]  Nick Hawes,et al.  Incremental , multi-level processing for comprehending situated dialogue in human-robot interaction , 2007 .

[21]  Henrik I. Christensen,et al.  Bringing Together Human and Robotic Environment Representations - A Pilot Study , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Jos J. A. Van Berkum,et al.  Chapter 13 Sentence Comprehension in a Wider Discourse: Can We Use ERPs To Keep Track of Things? , 2004 .