Using eye movements to determine referents in a spoken dialogue system

Most computational spoken dialogue systems take a "literary" approach to reference resolution. With this type of approach, entities that are mentioned by a human interactor are unified with elements in the world state based on the same principles that guide the process during text interpretation. In human-to-human interaction, however, referring is a much more collaborative process. Participants often under-specify their referents, relying on their discourse partners for feedback if more information is needed to uniquely identify a particular referent. By monitoring eye-movements during this interaction, it is possible to improve the performance of a spoken dialogue system on referring expressions that are underspecified according to the literary model. This paper describes a system currently under development that employs such a strategy.