A Model for Multimodal Reference Resolution

An important aspect of the interpretation of multimodal messages is the ability to identify when the same object in the world is the referent of symbols in different modalities. To understand the caption of a picture, for instance, one needs to identify the graphical symbols that are referred to by names and pronouns in the natural language text. One way to think of this problem is in terms of the notion of anaphora; however, unlike linguistic anaphoric inference, in which antecedents for pronouns are selected from a linguistic context, in the interpretation of the textual part of multimodal messages the antecedents are selected from a graphical context. Under this view, resolving multimodal references is like resolving anaphora across modalities. Another way to see the same problem is to look at pronouns in texts about drawings as deictic. In this second view, the context of interpretation of a natural language term is defined as a set of expressions of a graphical language with well-defined syntax and semantics. Natural language and graphical terms are thought of as standing in a relation of translation similar to the translation relation that holds between natural languages. In this paper a theory based on this second view is presented. In this theory, the relations between multimodal representation and spatial deixis, on the one hand, and multimodal reasoning and deictic inference, on the other, are discussed. An integrated model of anaphoric and deictic resolution in the context of the interpretation of multimodal discourse is also advanced.

[1]  Alistair G. Sutcliffe,et al.  Providing advice for multimedia designers , 1998, CHI.

[2]  John Lyons,et al.  Introduction to Theoretical Linguistics , 1971 .

[3]  David R. Dowty,et al.  Introduction to Montague semantics , 1980 .

[4]  Oliviero Stock,et al.  ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[5]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[6]  Robert Dale,et al.  A Fast Algorithm for the Generation of Referring Expressions , 1992, COLING.

[7]  Kent Wittenburg,et al.  Visual Language Parsing: If I Had a Hammer , 1995, Multimodal Human-Computer Communication.

[8]  Luis A. Pineda,et al.  Synthesis of solid models of polyhedra from their orthogonal views using logical representations , 1998 .

[9]  Massimo Poesio,et al.  Discourse interpretation and the scope of operators , 1994 .

[10]  Luis Alberto Pineda Cortes Graflog: a theory of semantics for graphics with applications to human-computer interaction and cad systems , 1989 .

[11]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[12]  Andrew Trotter Planning for Multimedia. , 1993 .

[13]  Raymond Reiter,et al.  The Logic of Depiction , 1987 .

[14]  Luis Alberto Pineda Reference, Synthesis and Constraint Satisfaction * , 1992, Comput. Graph. Forum.

[15]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[16]  Mark T. Maybury,et al.  Planning Multimedia Explanations Using Communicative Acts , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[17]  Thomas Rist,et al.  Referring To World Objects With Text And Pictures , 1994, COLING.

[18]  Wolfgang Wahlster,et al.  Plan-Based Integration of Natural Language and Graphics Generation , 1993, Artif. Intell..

[19]  Kees van Deemter,et al.  Semantic ambiguity and underspecification , 1996 .

[20]  Saul A. Kripke,et al.  Naming and Necessity , 1980 .

[21]  Luis A. Pineda,et al.  Graphical and linguistic dialogue for intelligent multimodal systems , 1998 .

[22]  Johanna D. Moore Participating in explanatory dialogues , 1994 .

[23]  Wolfgang Wahlster,et al.  User and discourse models for multimodal communication , 1991 .

[24]  Steven K. Feiner,et al.  Automating the generation of coordinated multimedia explanations , 1991, Computer.

[25]  Ewan Klein,et al.  Semantics and graphical information , 1990, INTERACT.

[26]  Tomek Strzalkowski,et al.  From Discourse to Logic , 1991 .

[27]  Jock D. Mackinlay Automatic design of graphical presentations , 1987 .

[28]  A. Cawsey Book Reviews: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context , 1995, CL.