Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context

The fundamental claim of this paper is that salience-both visual and linguistic-is an important overarching semantic category structuring visually situated discourse. Based on this we argue that computer systems attempting to model the evolving context of a visually situated discourse should integrate models of visual and linguistic salience within their natural language processing (NLP) framework. The paper highlights the importance of dynamically updating and interrelating visual and linguistic discourse context representations. To support our approach, we have developed a real-time, natural language virtual reality (NLVR) system (called LIVE, for Linguistic Interaction with Virtual Environments) that implements an NLP framework based on both visual and linguistic salience. Within this framework saliency information underpins two of the core subtasks of NLP: reference resolution and the generation of referring expressions. We describe the theoretical basis and architecture of the LIVE NLP framework and present extensive evaluation results comparing the system's performance with that of human participants in a number of experiments.

[1]  Manfred Pinkal,et al.  Definite Noun Phrases and the Semantics of Discourse , 1986, COLING.

[2]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[3]  Josef van Genabith,et al.  A Computational Model of the Referential Semantics of Projective Prepositions , 2006 .

[4]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[5]  Eva Hajičová,et al.  Issues of Sentence Structure and Discourse Patterns. , 1993 .

[6]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[7]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[8]  Massimo Poesio,et al.  Discourse interpretation and the scope of operators , 1994 .

[9]  G. Humphreys,et al.  Computational models of visual selective attention: A review , 2005 .

[10]  Mariët Theune,et al.  From data to speech : language generation in context , 2000 .

[11]  M. Posner,et al.  Attention and the detection of signals. , 1980, Journal of experimental psychology.

[12]  William C. Mann,et al.  Rhetorical Structure Theory: Description and Construction of Text Structures , 1987 .

[13]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[14]  Harry Bunt,et al.  Multimodal Cooperative Resolution of Referential Expressions in the DenK System , 1998, Cooperative Multimodal Communication.

[15]  Laurent Romary,et al.  Visual Salience and Perceptual Grouping in Multimodal Interactivity , 2001 .

[16]  Jean-Claude Latombe,et al.  Fast synthetic vision, memory, and learning models for virtual humans , 1999, Proceedings Computer Animation 1999.

[17]  D. Over,et al.  Studies in the Way of Words. , 1989 .

[18]  Paul McKevitt,et al.  Integration of Natural Language and Vision Processing , 1996, Springer Netherlands.

[19]  Patrick Oliver,et al.  Representation and Processing of Spatial Expressions , 1998 .

[20]  Roger C. Schank,et al.  Computer Models of Thought and Language , 1974 .

[21]  Lawrence E. Melamed,et al.  Perception: A Cognitive-Stage Approach , 1976 .

[22]  Marilyn A. Walker,et al.  Limited Attention and Discourse Structure , 1995, CL.

[23]  Emiel Krahmer,et al.  A New Model for Generating Multimodal Referring Expressions , 2003, ENLG@EACL.

[24]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[25]  Allen Allport,et al.  Visual attention , 1989 .

[26]  Deb Roy,et al.  Grounded Semantic Composition for Visual Scenes , 2011, J. Artif. Intell. Res..

[27]  Julie C. Sedivy,et al.  Integration of visuospatial and linguistic information: language comprehension in real time and real space , 1998 .

[28]  Matthew Stone,et al.  On identifying sets , 2000, INLG.

[29]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[30]  Demetri Terzopoulos,et al.  Artificial fishes: physics, locomotion, perception, behavior , 1994, SIGGRAPH.

[31]  Ipke Wachsmuth,et al.  An Anthropomorphic Agent for the Use of Spatial Language , 1996 .

[32]  James D. McCawley,et al.  Everything That Linguists Have Always Wanted to Know About Logic , 1980, Stud Logica.

[33]  J. Hobbs On the coherence and structure of discourse , 1985 .

[34]  Terry Winograd,et al.  A procedural model of language understanding , 1986 .

[35]  Claire Gardent,et al.  Generating Minimal Definite Descriptions , 2002, ACL.

[36]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[37]  Kees van Deemter,et al.  Generating Vague Descriptions , 2000, INLG.

[38]  A. Avramides Studies in the Way of Words , 1992 .

[39]  Wolfgang Wahlster,et al.  Readings in Intelligent User Interfaces , 1998 .

[40]  Donia Scott,et al.  Book Reviews: Generating Referring Expressions , 1994, CL.

[41]  James C. Lester,et al.  Deictic Believability: Coordinated Gesture, Locomotion, and Speech in Lifelike Pedagogical Agents , 1999, Appl. Artif. Intell..

[42]  Douglas E. Appelt,et al.  Planning English Referring Expressions , 1985, Artif. Intell..

[43]  T. Pechmann Incremental speech production and referential overspecification , 1989 .

[44]  R. Langacker Foundations of cognitive grammar , 1983 .

[45]  Daniel Thalmann,et al.  Navigation for digital actors based on synthetic vision, memory, and learning , 1995, Comput. Graph..

[46]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[47]  Kees van Deemter Generating Referring Expressions: Boolean Extensions of the Incremental Algorithm , 2002, CL.

[48]  Carol O'Sullivan,et al.  A memory model for autonomous virtual humans , 2002 .

[49]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[50]  Tokunaga Takenobu,et al.  Natural Language Generation in 1980s : 3 , 1991 .

[51]  Ellen Riloff,et al.  Corpus-Based Identification of Non-Anaphoric Noun Phrases , 1999, ACL.

[52]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[53]  Emiel Krahmer,et al.  Efficient context-sensitive generation of referring expressions , 2002 .

[54]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[55]  吴道平 Everything That Linguists Have Always Wanted to Know About Logic But Were Ashamed to Ask , 1985 .

[56]  D. Byron Understanding Referring Expressions in Situated Language Some Challenges for Real-World Agents Donna , 2003 .

[57]  Wim Claassen,et al.  Generating Referring Expressions in a Multimodal Environment , 1992, NLG.

[58]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[59]  Uwe Reyle,et al.  From discourse to logic , 1993 .

[60]  Alex Lascarides,et al.  Abducing Temporal Discourse , 1992, NLG.

[61]  Kees van Deemter,et al.  Information sharing : reference and presupposition in language generation and interpretation , 2002 .