A Simulation-based Framework for Spoken Language Understanding and Action Selection in Situated Interaction

This paper introduces a simulation-based framework for performing action selection and understanding for interactive agents. By simulating the objects and actions relevant to an interaction, an agent can semantically ground natural language and interact considerately and on its own initiative in situated environments. The framework proposed in this paper leverages models of the environment, user and system to predict possible future world states via simulation. It leverages understanding of spoken language and multi-modal input to estimate the state of the ongoing interaction and select actions based on the utility of future outcomes in the simulated world. In this paper we introduce this framework and demonstrate its effectiveness for in-car navigation.