The Role of the Timing between Multimodal Robot Behaviors for Joint Action

Introduction & Previous Work That timing in HRI is important is clear ever since Suchman (1987) has demonstrated the crucial role of timing for all interactions with technology. She showed that if system response is delayed, users, for whom the system behavior is not transparent, consider the lack of timely response as failure and initiate a new action, to which the system may respond with the previously initiated behavior, or it may abort the current action and start the next behavior, or the whole process may result in error altogether. Thus, users expect a timely response to their actions as a precondition for joint action. However, besides concerning timeliness in response to human action, timing also plays a role in the coordination of the robot’s own actions. This is particularly evident in embodied conversational agents; here, much previous work concerns the synchronization of speech, gaze and gesture (e.g. Skantze et al. 2014; Mehlmann et al. 2014); these studies show that multimodal integration contributes considerably to the agents’ perceived ‘naturalness’ and ‘liveliness’. However, models of multimodal processing have not been extended to the integration of speech, navigation and gesture/manipulation, i.e. actions that play a crucial role in human-robot joint action. In social robotics, much work concerns the timing of the robot’s behavior with respect to the human’s behavior (e.g. concerning gaze, cf. Mutlu et al. 2012, Fischer et al. 2013), yet the synchronization of robot behaviors such as movement of the body, speech and arm movement, for instance, has rarely been addressed. So what the timing should be between, say, movement, speech and gesture in order to allow for smooth joint action is sill open. In interactions between humans, Clark & Krych (2004) have investigated how individual actions, such as holding or placing on object, function as communicative acts. While they don’t focus on timing, they show that speech and object placement are generally very well coordinated in order to allow the partner to infer the other’s intentions and to predict the next move (cf. also Clark 2002). Thus the appropriate timing of multimodal action leads to legibility of this behavior and thus contributes to predictability of the actor. To investigate the role of timing of multimodal robot behavior, we carried out an experiment in which the robot either employed its multimodal behaviors sequentially or synchronized and analyzed the effects of the timing on joint action.