Situated Language Understanding as Filtering Perceived Affordances

We introduce a computational theory of situated language understanding in which the meaning of words and utterances depends on the physical environment and the goals and plans of communication partners. According to the theory, concepts that ground linguistic meaning are neither internal nor external to language users, but instead span the objective-subjective boundary. To model the possible interactions between subject and object, the theory relies on the notion of perceived affordances: structured units of interaction that can be used for prediction at multiple levels of abstraction. Language understanding is treated as a process of filtering perceived affordances. The theory accounts for many aspects of the situated nature of human language use and provides a unified solution to a number of demands on any theory of language understanding including conceptual combination, prototypicality effects, and the generative nature of lexical items. To support the theory, we describe an implemented system that understands verbal commands situated in a virtual gaming environment. The implementation uses probabilistic hierarchical plan recognition to generate perceived affordances. The system has been evaluated on its ability to correctly interpret free-form spontaneous verbal commands recorded from unrehearsed game play between human players. The system is able to "step into the shoes" of human players and correctly respond to a broad range of verbal commands in which linguistic meaning depends on social and physical context. We quantitatively compare the system's predictions in response to direct player commands with the actions taken by human players and show generalization to unseen data across a range of situations and verbal constructions.

[1]  C. K. Ogden,et al.  The Meaning of Meaning , 1923 .

[2]  G. Miller,et al.  Plans and the structure of behavior , 1960 .

[3]  A. Battersby Plans and the Structure of Behavior , 1968 .

[4]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[5]  C. Raymond Perrault,et al.  Analyzing Intention in Utterances , 1986, Artif. Intell..

[6]  G. Miller,et al.  Cognitive science. , 1981, Science.

[7]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[8]  James F. Allen,et al.  A Plan Recognition Model for Clarification Subdialogues , 1984, ACL.

[9]  C. Carswell Open the door , 1986 .

[10]  Roger C. Schank,et al.  SCRIPTS, PLANS, GOALS, AND UNDERSTANDING , 1988 .

[11]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[12]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[13]  G. Reeke The society of mind , 1991 .

[14]  Marvin Minsky,et al.  Society of Mind: A Response to Four Reviews , 1991, Artif. Intell..

[15]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[16]  Andreas Stolcke,et al.  An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities , 1994, CL.

[17]  A. Clark Being There: Putting Brain, Body, and World Together Again , 1996 .

[18]  Brian Cantwell Smith,et al.  On the origin of objects , 1997, Trends in Cognitive Sciences.

[19]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[20]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[21]  L. Barsalou,et al.  Whither structured representation? , 1999, Behavioral and Brain Sciences.

[22]  S. Laurence,et al.  Concepts and Cognitive Science , 1999 .

[23]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[24]  Michael P. Wellman,et al.  Probabilistic State-Dependent Grammars for Plan Recognition , 2000, UAI.

[25]  F. Pulvermüller,et al.  Walking or Talking?: Behavioral and Neurophysiological Correlates of Action Verb Processing , 2001, Brain and Language.

[26]  Matthew Stone,et al.  Representing Communicative Intentions in Collaborative Conversational Agents , 2001 .

[27]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[28]  Mark Steedman Formalizing Affordance , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[29]  Michael P. Kaschak,et al.  Grounding language in action , 2002, Psychonomic bulletin & review.

[30]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[31]  Rolf A. Zwaan The Immersed Experiencer: Toward An Embodied Theory Of Language Comprehension , 2003 .

[32]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[33]  William Schuler,et al.  Using Model-Theoretic Semantic Interpretation to Guide Statistical Parsing and Word Recognition in a Spoken Language Interface , 2003, ACL.

[34]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[35]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[36]  Deb Roy,et al.  Coupling perception and simulation: steps towards conversational robotics , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[37]  Deb Roy,et al.  Grounded Semantic Composition for Visual Scenes , 2011, J. Artif. Intell. Res..

[38]  J. Prinz Furnishing the Mind: Concepts and Their Perceptual Basis , 2004 .

[39]  Deb Roy,et al.  Mental imagery for a conversational robot , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[40]  Alexander Stoytchev,et al.  Behavior-Grounded Representation of Tool Affordances , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[41]  Deb Roy,et al.  Probabilistic grounding of situated speech using plan recognition and reference resolution , 2005, ICMI '05.

[42]  M. Privitera What Is Ease of Use? , 2005 .

[43]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[44]  R. Goldman,et al.  Partial Observability and Probabilistic Plan/Goal Recognition , 2005 .

[45]  Deb Roy,et al.  Speaking with your Sidekick: Understanding Situated Speech in Computer Role Playing Games , 2005, AIIDE.

[46]  Jordan B. Peterson The Meaning of Meaning , 2007 .

[47]  三嶋 博之 The theory of affordances , 2008 .