Body movements and laughter recognition: experiments in first encounter dialogues

This paper reports work on automatic analysis of laughter and human body movements in a video corpus of human-human dialogues. We use the Nordic First Encounters video corpus where participants meet each other for the first time. This corpus has manual annotations of participants' head, hand and body movements as well as laughter occurrences. We employ machine learning methods to analyse the corpus using two types of features: visual features that describe bounding boxes around participants' heads and bodies, automatically detecting body movements in the video, and audio speech features based on the participants' spoken contributions. We then correlate the speech and video features and apply neural network techniques to predict if a person is laughing or not given a sequence of video features. The hypothesis is that laughter occurrences and body movement are synchronized, or at least there is a significant relation between laughter activities and occurrences of body movements. Our results confirm the hypothesis of the synchrony of body movements with laughter, but we also emphasise the complexity of the problem and the need for further investigations on the feature sets and the algorithm used.

[1]  Kristiina Jokinen,et al.  Recognition of Human Body Movements for Studying Engagement in Conversational Video Files , 2015 .

[2]  Emer Gilmartin,et al.  Laugher and Topic Transition in Multiparty Conversation , 2013, SIGDIAL Conference.

[3]  Wallace L. Chafe,et al.  The Importance of Not Being Earnest – The Feeling behind Laughter and Humor , 2011, Phonetica.

[4]  J. Trouvain Segmenting Phonetic Units in Laughter , 2003 .

[5]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[6]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[7]  J. Bachorowski,et al.  The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[8]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[9]  Costanza Navarretta,et al.  Feedback in Nordic First-Encounters: a Comparative Study , 2012, LREC.

[10]  B Maegaard,et al.  Acoustic Features of Different Types of Laughter in North Sami Conversational Speech , 2016 .

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Jeremy I. Skipper,et al.  Co‐speech gestures influence neural activity in brain regions associated with processing semantic information , 2009, Human brain mapping.

[13]  Francesca Bonin,et al.  Content and context in conversations : the role of social and situational signals in conversation structure , 2016 .

[14]  David A. van Leeuwen,et al.  Automatic discrimination between laughter and speech , 2007, Speech Commun..

[15]  Nick Campbell,et al.  Acoustic Features of Four Types of Laughter in Natural Conversational Speech , 2011, ICPhS.

[16]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[17]  Keiichi Abe,et al.  Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[18]  Julia Hirschberg,et al.  Acoustic-Prosodic Entrainment and Social Behavior , 2012, NAACL.

[19]  Nick Campbell,et al.  Comparing measures of synchrony and alignment in dialogue speech timing with respect to turn-taking activity , 2010, INTERSPEECH.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Kristiina Jokinen,et al.  Investigating Engagement - intercultural and technological aspects of the collection, analysis, and use of the Estonian Multiparty Conversational video data , 2012, LREC.

[22]  Kristiina Jokinen,et al.  Experiments With Hand-tracking Algorithm in Video Conversations , 2015 .

[23]  Catherine Conybeare A Time for Laughter , 2013 .

[24]  A. Kendon Gesture: Visible Action as Utterance , 2004 .