Abstract Spoken interactions are known for accurate timing and alignment between interlocutors: turn-taking and topic flow are managed in a manner that provides conversational fluency and smooth progress of the task. This paper studies the relation between the interlocutors’ eye-gaze and spoken utterances, and describes our experiments on turn alignment. We conducted classification experiments by Support Vector Machine on turn-taking using the features for dialogue act, eye-gaze, and speech prosody in conversation data. As a result, we demonstrated that eye-gaze features are important signals in turn management, and seem even more important than speech features when the intention of utterances is clear. Index Terms : eye-gaze, dialogue, interaction, speech analysis, turn-taking 1. Introduction The role of eye-gaze in fluent communication has long since been acknowledged ([2]; [7]). Previous research has established close relations between eye-gaze and conversational feedback ([3]), building trust and rapport, as well as focus of shared attention ([15]). Eye-gaze is also important in turn-taking signalling: usually the interlocutors signal their wish to give the turn by gazing up to the interlocutor, leaning back, and dropping in pitch and loudness, and the partner can, accordingly, start preparing to take the turn. There is evidence that lack of eye contact decreases turn-taking efficiency in video-conferencing ([16]), and that the coupling of speech and gaze streams in a word acquisition task can improve performance significantly ([11]). Several computational models of eye-gaze behaviour for artificial agents have also been designed. For instance, [9] describe an eye-gaze model for believable virtual humans, [13] demonstrate gaze modelling for conversational engage-ment, and [10] built an eye-gaze model to ground information in interactions with embodied conversational agents. Our research focuses on turn-taking and eye-gaze alignment in natural dialogues and especially on the role of eye-gaze as a means to coordinate and control turn-taking. In our previous work [5,6] we noticed that in multi-party dialogues the participants head movement was important in signalling turn-taking, maybe because of its greater visibility than eye-gaze. (This is in agreement with [12], who noticed that in virtual environments, head tracking seems sufficient when people turn their heads to look but if the person is not turning their head to look at an object, then eye-tracking is important to discern the gaze of a person.) The main objective in the current research is to explore the relation between eye-gaze and speech, in particular, how the annotated turn and dialogue features and automatically recognized speech properties affect in turn management. Methodologically our research relies on experimentation and observation: signal-level measurements and analysis of gaze and speech are combined with human-level observation of dialogue events (dialogue acts and turn-taking). We use our three-party dialogue data that is analysed with respect to the interlocutors’ speech, and annotated with dialogue acts, eye-gaze, and turn-taking features [6]. The experiments deal with the classification of turn-taking events using the analysed features and the results show that eye-gaze speech information significantly improves the accuracy compared with the classification with dialogue act information only. However, what is also interesting that the difference between gaze and speech features is not significant, i.e. eye-gaze and speech are important signals in turn management, but their effect is parallel rather than complementary. Moreover, eye-gaze seems to more important than speech when the intention of the utterance is clear. The paper is structured as follows. We first describe the research on turn-taking and the alignment of speech and gaze in Section 2. We then present our data and speech analysis in Section 3, and experimental results as well as discussion concerning their importance in Section 4. Section 5 draws conclusions and points to future research.
[1]
M. Argyle,et al.
Gaze and Mutual Gaze
,
1994,
British Journal of Psychiatry.
[2]
Masafumi Nishida,et al.
Eye-gaze experiments for conversation monitoring
,
2009,
IUCS.
[3]
C. Goodwin.
Action and embodiment within situated human interaction
,
2000
.
[4]
Roel Vertegaal,et al.
GAZE-2: conveying eye contact in group video conferencing using eye-controlled camera direction
,
2003,
CHI '03.
[5]
A. Kendon.
Some functions of gaze-direction in social interaction.
,
1967,
Acta psychologica.
[6]
Costanza Navarretta,et al.
The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena
,
2007,
Lang. Resour. Evaluation.
[7]
Malcolm J. Bowman,et al.
Proceedings of the Workshop
,
1978
.
[8]
Kristiina Jokinen,et al.
Constructive Dialogue Modelling - Speech Interaction and Rational Agents
,
2009,
Wiley series in agent technology.
[9]
Toyoaki Nishida,et al.
Attentional Behaviors as Nonverbal Communicative Signals in Situated Interactions with Conversational Agents
,
2007
.
[10]
Brent Lance,et al.
The Rickel Gaze Model: A Window on the Mind of a Virtual Human
,
2007,
IVA.
[11]
Masafumi Nishida,et al.
Collecting and Annotating Conversational Eye-Gaze Data
,
2010
.
[12]
Candace L. Sidner,et al.
Explorations in engagement for humans and robots
,
2005,
Artif. Intell..
[13]
P. Ekman,et al.
Approaches To Emotion
,
1985
.
[14]
Michael Kipp,et al.
ANVIL - a generic annotation tool for multimodal dialogue
,
2001,
INTERSPEECH.
[15]
Andreas Stolcke,et al.
Dialogue act modeling for automatic tagging and recognition of conversational speech
,
2000,
CL.
[16]
Robin Wolff,et al.
Communicating Eye-gaze Across a Distance: Comparing an Eye-gaze enabled Immersive Collaborative Virtual Environment, Aligned Video Conferencing, and Being Together
,
2009,
2009 IEEE Virtual Reality Conference.
[17]
Joyce Yue Chai,et al.
The Role of Interactivity in Human-Machine Conversation for Automatic Word Acquisition
,
2009,
SIGDIAL Conference.