A model of joint attention for humans and machines

Joint attention is the simultaneous allocation of attention to a target as a consequence of attending to each other’s attentional states. It is an important prerequisite for successful interaction and supports the grounding of future actions. Cognitive modeling of joint attention in a virtual agent requires an operational model for behavior recognition and production. To this end, we created a declarative four-phase-model (initiate / respond / feedback / focus) of joint attention based on a literature review. We applied this model to gaze communication and implemented it in the cognitive architecture of our virtual agent Max. To substantiate the model regarding the natural timing of gaze behavior, we conducted a study on human-agent interactions in immersive virtual reality. The results show that participants preferred the agent to exhibit a timing behavior similar to their own. Building on these insights, we now aim at a process model of joint attention. We are interested in patterns of joint attention emerging in natural interactions. In the preliminary results of a human-human study, we find patterns of fixation targets and fixation durations that allow us to identify the four phases and infer the current state of joint attention.