A model for multimodal reference resolution

In this paper a discussion on multimodal referent resolution is presented. The discussion is centered on the analysis of how the referent of an expression in one modality can be found whenever the contextual information required for carrying on such an inference is expressed in one or more different modalities. In particular, a model for identifying the referent of a graphical expression when the relevant contextual information is expressed through natural language is presented. The model is also applied to the reciprocal problem of identifying the referent of a linguistic expression whenever a graphical context is given. In Section 1 of this paper the notion of modality in terms of which the theory is developed is presented. The discussion is motivated with a case of study in multimodal reference resolution. In Section 2 a theory for multimodal representation along the lines of Montague's semiotic programme is presented. In Section 3, an incremental model for multimodal reference resolution is illustrated. In Section 4 a brief discussion of how the theory could be extended to handle multimodal discourse is advanced. Finally, in the conclusion of the paper, a reflexion on the relation between spacial deixis and anaphora is advanced.