Instruction modes for joint spatial reference between naive users and a mobile robot

This paper reports about an experiment addressing different modes of natural language instructions in spatial human-robot interaction. The experimental setting involves a mobile robot equipped with an elementary dialogue system and human users unfamiliar with it who are required to achieve joint spatial reference with the robot either in spoken or in written mode. In addition, the robot's output is varied between an initial scene description (indicating the robot's conceptual and linguistic knowledge), and no initial output. Our robot uses a computational model of spatial reference to interpret the linguistic instructions that is based on psycholinguistic evidence and on previous experiments. Results show that the model is successful with regard to the correct interpretation of the intended kinds of instructions, that scene descriptions can encourage users to refer directly to the goal object, and that there are lasting negative effects on communication if potentially successful spoken instructions are not recognized by the speech component.