Automatic identification of head movements in video-recorded conversations: can words help?

We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 68% with respect to these. The results also show that using jerk improves accuracy. We then conduct an investigation of the overlap between temporal sequences classified as either movement or non-movement and the speech stream of the person performing the gesture. The statistics derived from this analysis show that using word features may help increase the accuracy of the model.

[1]  Ashish Kapoor,et al.  A real-time head nod and shake detector , 2001, PUI '01.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Bart Jongejan Automatic annotation of head velocity and acceleration in Anvil , 2012, LREC.

[4]  David S. Monaghan,et al.  Real-time head nod and shake detection for continuous human affect recognition , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[5]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[6]  V. Yngve On getting a word in edgewise , 1970 .

[7]  Costanza Navarretta,et al.  The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations , 2017, Lang. Resour. Evaluation.

[8]  Dirk Heylen,et al.  Searching for Prototypical Facial Feedback Signals , 2007, IVA.

[9]  Jens Allwood,et al.  The structure of dialog , 1999 .

[10]  Gang Rong,et al.  A real-time head nod and shake detector using HMMs , 2003, Expert Syst. Appl..

[11]  Michael Kipp,et al.  Gesture generation by imitation: from human behavior to computer character animation , 2005 .

[12]  Kristiina Jokinen,et al.  Automatic and Manual Annotations in First Encounter Dialogues , 2014, Baltic HLT.

[13]  Costanza Navarretta,et al.  The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena , 2007, Lang. Resour. Evaluation.

[14]  Zheng Zhao,et al.  Head Movement Recognition Based on Lucas-Kanade Algorithm , 2012, 2012 International Conference on Computer Science and Service System.

[15]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Thierry Dutoit,et al.  Generating Robot/Agent backchannels during a storytelling experiment , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Evelyn Z. McClave Linguistic functions of head movements in the context of speech , 2000 .

[18]  Costanza Navarretta,et al.  Head Movements, Facial Expressions and Feedback in Danish First Encounters Interactions: A Culture-Specific Analysis , 2011, HCI.

[19]  Trevor Darrell,et al.  Contextual recognition of head gestures , 2005, ICMI '05.

[20]  Costanza Navarretta,et al.  The NOMCO Multimodal Nordic Resource - Goals and Characteristics , 2010, LREC.