Probabilistic grounding of situated speech using plan recognition and reference resolution

Situated, spontaneous speech may be ambiguous along acoustic, lexical, grammatical and semantic dimensions. To understand such a seemingly difficult signal, we propose to model the ambiguity inherent in acoustic signals and in lexical and grammatical choices using compact, probabilistic representations of multiple hypotheses. To resolve semantic ambiguities we propose a situation model that captures aspects of the physical context of an utterance as well as the speaker's intentions, in our case represented by recognized plans. In a single, coherent Framework for Understanding Situated Speech (FUSS) we show how these two influences, acting on an ambiguous representation of the speech signal, complement each other to disambiguate form and content of situated speech. This method produces promising results in a game playing environment and leaves room for other types of situation models.

[1]  Deb Roy,et al.  Towards situated speech understanding: visual context priming of language models , 2005, Comput. Speech Lang..

[2]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[3]  Deb Roy,et al.  Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games , 2005, AIIDE.

[4]  N. Cocchiarella,et al.  Situations and Attitudes. , 1986 .

[5]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[6]  C. Raymond Perrault,et al.  Elements of a Plan-Based Theory of Speech Acts , 1979, Cogn. Sci..

[7]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[8]  Deb Roy,et al.  Grounded Semantic Composition for Visual Scenes , 2011, J. Artif. Intell. Res..

[9]  J. Feldman,et al.  Karma: knowledge-based active representations for metaphor and aspect , 1997 .

[10]  Dilek Z. Hakkani-Tür,et al.  A general algorithm for word graph matrix decomposition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Nicholas Haddock,et al.  Computational models of incremental semantic interpretation , 1989 .

[12]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  C. Raymond Perrault,et al.  Analyzing Intention in Utterances , 1986, Artif. Intell..

[15]  Michael P. Wellman,et al.  Probabilistic State-Dependent Grammars for Plan Recognition , 2000, UAI.

[16]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[17]  William Schuler,et al.  Using Model-Theoretic Semantic Interpretation to Guide Statistical Parsing and Word Recognition in a Spoken Language Interface , 2003, ACL.

[18]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).