Multimodal Detection of Engagement in Groups of Children Using Rank Learning

In collaborative play, children exhibit different levels of engagement. Some children are engaged with other children while some play alone. In this study, we investigated multimodal detection of individual levels of engagement using a ranking method and non-verbal features: turn-taking and body movement. Firstly, we automatically extracted turn-taking and body movement features in naturalistic and challenging settings. Secondly, we used an ordinal annotation scheme and employed a ranking method considering the great heterogeneity and temporal dynamics of engagement that exist in interactions. We showed that levels of engagement can be characterised by relative levels between children. In particular, a ranking method, Ranking SVM, outperformed a conventional method, SVM classification. While either turn-taking or body movement features alone did not achieve promising results, combining the two features yielded significant error reduction, showing their complementary power.

[1]  A. Dittmann,et al.  Body movement and speech rhythm in social conversation. , 1969, Journal of personality and social psychology.

[2]  Fabio Pianesi,et al.  A multimodal annotated corpus of consensus decision making meetings , 2007, Lang. Resour. Evaluation.

[3]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[4]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  H. R. Beech,et al.  Social Interaction , 1970, Encyclopedia of Social Network Analysis and Mining.

[6]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[7]  Samer Al Moubayed,et al.  Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children , 2015, ICMI.

[8]  Carlos Busso,et al.  Real-Time Monitoring of Participants' Interaction in a Meeting using Audio-Visual Sensors , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Gary R. Bradski,et al.  Motion segmentation and pose recognition with motion history gradients , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[13]  Candace L. Sidner,et al.  Explorations in engagement for humans and robots , 2005, Artif. Intell..

[14]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[15]  Dirk Heylen,et al.  Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing , 2012, IEEE Transactions on Affective Computing.

[16]  Judith A. Hall,et al.  Nonverbal behavior and the vertical dimension of social relations: a meta-analysis. , 2005, Psychological bulletin.

[17]  Brian Scassellati,et al.  Comparing Models of Disengagement in Individual and Group Interactions , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[19]  Dubravko Culibrk,et al.  K-means based segmentation for real-time zenithal people counting , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[20]  M. Parten Social participation among pre-school children. , 1932 .

[21]  Aude Billard,et al.  Robotic assistants in therapy and education of children with autism: can a small humanoid robot help encourage social interaction skills? , 2005, Universal Access in the Information Society.

[22]  Jake K. Aggarwal,et al.  Human motion: modeling and recognition of actions and interactions , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[23]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[24]  Sofiane Boucenna,et al.  Evaluating the Engagement with Social Robots , 2015, International Journal of Social Robotics.

[25]  Nick Campbell,et al.  Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity , 2010, INTERSPEECH.

[26]  Catharine Oertel Gen Bierbach On the use of multimodal cues for the prediction of involvement in spontaneous conversation , 2011 .

[27]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[28]  Tao Qin,et al.  Feature selection for ranking , 2007, SIGIR.

[29]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[30]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[31]  Vanessa Evers,et al.  Automatic detection of children's engagement using non-verbal features and ordinal learning , 2016, WOCCI.

[32]  Charles Stangor,et al.  Social groups in action and interaction , 2004 .

[33]  K. H. Stauder,et al.  Psychology of the Child , 1959 .

[34]  Nadia Bianchi-Berthouze,et al.  Does Body Movement Engage You More in Digital Game Play? and Why? , 2007, ACII.

[35]  Dirk Heylen,et al.  Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study , 2015, INTERSPEECH.

[36]  Shrikanth Narayanan,et al.  ASSESSMENT OF A CHILD ’ S ENGAGEMENT USING SEQUENCE MODEL BASED FEATURES , 2013 .

[37]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[38]  Jean-Marc Odobez,et al.  Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues , 2008, ICMI '08.

[39]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[40]  Ragini Verma,et al.  Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech , 2015, Comput. Speech Lang..