Automatic analysis of children's engagement using interactional network features

We explored the automatic analysis of vocal non-verbal cues of a group of children in the context of engagement and collaborative play. For the current study, we defined two types of engagement on groups of children: harmonised and unharmonised. A spontaneous audiovisual corpus with groups of children who collaboratively build a 3D puzzle was collected. With this corpus, we modelled the interactions among children using network-based features representing the centrality and similarity of interactions. The centrality measures how interactions among group members are concentrated on a specific speaker while the similarity measures how similar the interactions are. We examined their discriminative characteristics in harmonised and unharmonised engagement situations. High centrality and low similarity values were found in unharmonised engagement situations. In harmonised engagement situations, we found low centrality and high similarity values. These results suggest that interactional network features are promising for the development of automatic detection of engagement at the group level.

[1]  Fabio Pianesi,et al.  A multimodal annotated corpus of consensus decision making meetings , 2007, Lang. Resour. Evaluation.

[2]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[3]  Dirk Heylen,et al.  Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study , 2015, INTERSPEECH.

[4]  Brian Scassellati,et al.  Classification of Children's Social Dominance in Group Interactions with Robots , 2015, ICMI.

[5]  Tetsunori Kobayashi,et al.  Four-participant group conversation: A facilitation robot controlling engagement density as the fourth participant , 2015, Comput. Speech Lang..

[6]  Javier R. Movellan,et al.  The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[7]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[8]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[9]  Dirk Heylen,et al.  Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing , 2012, IEEE Transactions on Affective Computing.

[10]  Brian Scassellati,et al.  Comparing Models of Disengagement in Individual and Group Interactions , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Samer Al Moubayed,et al.  Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children , 2015, ICMI.

[12]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[13]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[14]  Carlos Busso,et al.  Real-Time Monitoring of Participants' Interaction in a Meeting using Audio-Visual Sensors , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Allen L. Gorin,et al.  Social correlates of turn-taking behavior , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Candace L. Sidner,et al.  Explorations in engagement for humans and robots , 2005, Artif. Intell..

[17]  Charles Stangor,et al.  Social groups in action and interaction , 2004 .

[18]  Nick Campbell,et al.  Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity , 2010, INTERSPEECH.

[19]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[20]  Jean-Marc Odobez,et al.  Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues , 2008, ICMI '08.

[21]  Ana Paiva,et al.  Detecting Engagement in HRI: An Exploration of Social and Task-Based Context , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[22]  Kristiina Jokinen Turn taking , Utterance Density , and Gaze Patterns as Cues to Conversational Activity , 2011 .