Situated Incremental Natural Language Understanding using a Multimodal, Linguistically-driven Update Model

A common site of language use is interactive dialogue between two people situated together in shared time and space. In this paper, we present a statistical model for understanding natural human language that works incrementally (i.e., does not wait until the end of an utterance to begin processing), and is grounded by linking semantic entities with objects in a shared space. We describe our model, show how a semantic meaning representation is grounded with properties of real-world objects, and further show that it can ground with embodied, interactive cues such as pointing gestures or eye gaze.

[1]  David Schlangen,et al.  Joint Satisfaction of Syntactic and Pragmatic Constraints Improves Incremental Spoken Language Understanding , 2012, EACL.

[2]  David DeVault,et al.  Incremental interpretation and prediction of utterance meaning for interactive dialogue , 2011, Dialogue Discourse.

[3]  Stefanie Tellex,et al.  Object schemas for grounding language in a responsive robot , 2008, Connect. Sci..

[4]  David Schlangen,et al.  The InproTK 2012 release , 2012, SDCTD@NAACL-HLT.

[5]  Oliver Lemon,et al.  Accurate statistical spoken language understanding from limited development resources , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Gabriel Skantze,et al.  Towards Incremental Speech Generation in Dialogue Systems , 2010, SIGDIAL Conference.

[7]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[8]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[9]  Joyce Yue Chai,et al.  Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue , 2010, EMNLP.

[10]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[11]  Luke S. Zettlemoyer,et al.  Learning Context-Dependent Mappings from Sentences to Logical Form , 2009, ACL.

[12]  Renato De Mori,et al.  Markov Logic Networks for Spoken Language Interpretation , 2008 .

[13]  Changsong Liu,et al.  Collaborative Effort towards Common Ground in Situated Human-Robot Dialogue , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  David Schlangen,et al.  Incremental Construction of Robust but Deep Semantic Representations for Use in Responsive Dialogue Systems , 2012, Coling 2012.

[15]  David Schlangen,et al.  Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information , 2013, SIGDIAL Conference.

[16]  Stefan Kopp,et al.  Combining Incremental Language Generation and Incremental Speech Synthesis for Adaptive Information Presentation , 2012, SIGDIAL Conference.

[17]  Renato De Mori,et al.  Spoken language interpretation: On the use of dynamic Bayesian networks for semantic composition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Ellen Campana,et al.  Software architectures for incremental understanding of human speech , 2006, INTERSPEECH.

[19]  David Schlangen,et al.  Markov Logic Networks for Situated Incremental Natural Language Understanding , 2012, SIGDIAL Conference.

[20]  David Schlangen,et al.  Comparing Local and Sequential Models for Statistical Incremental Natural Language Understanding , 2010, SIGDIAL Conference.

[21]  Meng Joo Er,et al.  A hybrid computational model for spoken language understanding , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[22]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[23]  David DeVault,et al.  Can I Finish? Learning When to Respond to Incremental Interpretation Results in Interactive Dialogue , 2009, SIGDIAL Conference.

[24]  David Schlangen,et al.  Collaborating on Utterances with a Spoken Dialogue System Using an ISU-based Approach to Incremental Dialogue Management , 2010, SIGDIAL Conference.

[25]  David Schlangen,et al.  Investigating speaker gaze and pointing behaviour in human-computer interaction with the mint.tools collection , 2013, SIGDIAL Conference.

[26]  Gabriel Skantze,et al.  Incremental Dialogue Processing in a Micro-Domain , 2009, EACL.

[27]  Ellen Campana,et al.  Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods , 2007 .

[28]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[29]  Raquel Fernández,et al.  Referring under Restricted Interactivity Conditions , 2007, SIGDIAL.

[30]  Jason D. Williams,et al.  Integrating Incremental Speech Recognition and POMDP-Based Dialogue Systems , 2012, SIGDIAL Conference.

[31]  David Schlangen,et al.  Assessing and Improving the Performance of Speech Recognition for Incremental Systems , 2009, NAACL.

[32]  Changsong Liu,et al.  Towards Mediating Shared Perceptual Basis in Situated Dialogue , 2012, SIGDIAL Conference.

[33]  D. Roy Grounding words in perception and action: computational insights , 2005, Trends in Cognitive Sciences.

[34]  Ann Copestake Semantic Composition with (Robust) Minimal Recursion Semantics , 2007, ACL 2007.

[35]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2009, EACL.