Generating Referring Expressions in a Multimodal Context An empirically oriented approach

In this paper an algorithm for the generation of referring expressions in a multimodal setting is presented. The algorithm is based on empirical studies of how humans refer to objects in a shared workspace. The main ingredients of the algorithm are the following. First, the addition of deictic pointing gestures, where the decision to point is determined by two factors: the effort of pointing (measured in terms of the distance to and size of the target object) as well as the effort required for a full linguistic description (measured in terms of number of required properties and relations). Second, the algorithm explicitly keeps track of the current focus of attention, in such a way that objects which are closely related to the object which was most recently referred to are more prominent than objects which are farther away. To decide which object are ‘closely related’ we make use of the concept of perceptual grouping. Finally, each object in the domain is assigned a three-dimensional salience weight indicating whether it is linguistically and/or inherently salient and whether it is part of the current focus of attention. The resulting algorithm is capable of generating a variety of referring expressions, where the kind of NP is co-determined by the accessibility of the target object (in terms of salience), the presence or absence of a relatum as well as the possible inclusion of a pointing gesture.

[1]  Carla Huls,et al.  Automatic Referent Resolution of Deictic and Anaphoric Expressions , 1995, CL.

[2]  Laurent Romary,et al.  Generating Referring Expressions in Multimodal Contexts , 2000 .

[3]  P. Fitts The information capacity of the human motor system in controlling the amplitude of movement. , 1954, Journal of experimental psychology.

[4]  Philip R. Cohen The Pragmatics of Referring and the Modality of Communication , 1984, Comput. Linguistics.

[5]  H. H. Clark,et al.  Referring as a collaborative process , 1986, Cognition.

[6]  G. Zipf,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .

[7]  Robert Dale,et al.  Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions , 1995, Cogn. Sci..

[8]  Anita Cremers,et al.  Reference to objects : an empirically based study of task-oriented dialogues , 1996 .

[9]  Robbert-Jan Beun,et al.  Object reference in a shared domain of conversation , 1998 .

[10]  Helmut Horacek,et al.  An algorithm for generating referential descriptions with flexible interfaces , 1997 .

[11]  Ahm Cremers,et al.  Dutch and English demonstratives : a comparison , 1996 .

[12]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[13]  Kristinn R. Thórisson,et al.  Simulated Perceptual Grouping: An Application to Human-Computer Interaction , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[14]  Kees van Deemter Generating Referring Expressions: Beyond the Incremental Algorithm , 2000 .

[15]  Helmut Horacek,et al.  An Algorithm for Generating Referential Descriptions with Flexible Interfaces , 1997, ACL.

[16]  Emiel Krahmer,et al.  A Meta-Algorithm for the Generation of Referring Expressions , 2001, EWNLG@ACL.

[17]  Mariët Theune,et al.  From data to speech : language generation in context , 2000 .

[18]  Wim Claassen,et al.  Generating Referring Expressions in a Multimodal Environment , 1992, NLG.

[19]  Norbert Reithinger,et al.  The Performance of an Incremental Generation Component for Multi-Modal Dialog Contributions , 1992, NLG.

[20]  T. Pechmann Incremental speech production and referential overspecification , 1989 .

[21]  Emiel Krahmer,et al.  Efficient context-sensitive generation of referring expressions , 2002 .

[22]  E. Krahmer,et al.  Efficient Generation of Descriptions in Context , 1999 .