Automatic Identification of Speakers From Head Gestures in a Narration

In this work, we focus on quantifying speaker identity information encoded in the head gestures of speakers, while they narrate a story. We hypothesize that the head gestures over a long duration have speaker-specific patterns. To establish this, we consider a classification problem to identify speakers from head gestures. We represent every head orientation as a triplet of Euler angles and a sequence of head orientations as head gestures. We use a database having recordings from 24 speakers where the head movements are recorded using a motion capture device, with each subject narrating ten stories. We get the best speaker identification accuracy of 0.836 using head gestures over a duration of 40 seconds. Further, the accuracy increases by combining decisions from multiple 40 second windows when a recording is available with duration more than the window length. We achieve an average accuracy of 0.9875 on our database when the entire recording is used. Analysis of the speaker identification performance over 40 second windows across a recording reveals that the speaker-identity information is more prevalent in some parts of a story than others.

[1]  Prasanta Kumar Ghosh,et al.  An Information Theoretic Analysis of the Temporal Synchrony Between Head Gestures and Prosodic Patterns in Spontaneous Speech , 2017, INTERSPEECH.

[2]  Hao Li,et al.  Emotional head motion predicting from prosodic and linguistic features , 2016, Multimedia Tools and Applications.

[3]  Peter Robinson,et al.  Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[4]  P. Ekman,et al.  Head and body cues in the judgment of emotion: a reformulation. , 1967, Perceptual and motor skills.

[5]  Zhigang Deng,et al.  Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  T. Kobayashi,et al.  A conversation robot using head gesture recognition as para-linguistic information , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Tanya Stivers,et al.  Research on Language & Social Interaction , 2011 .

[9]  Carlos Busso,et al.  Head Motion Generation with Synthetic Speech: A Data Driven Approach , 2016, INTERSPEECH.

[10]  G. Luck,et al.  Personality traits correlate with characteristics of music-induced movement , 2009 .

[11]  Hatice Gunes,et al.  Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners , 2010, IVA.

[12]  Prasanta Kumar Ghosh,et al.  Classification of story-telling and poem recitation using head gesture of the talker , 2018, 2018 International Conference on Signal Processing and Communications (SPCOM).

[13]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[14]  Takaaki Kuratate,et al.  Audio-visual synthesis of talking faces from speech production correlates. , 1999 .

[15]  Sascha Fagel,et al.  Visual information and redundancy conveyed by internal articulator dynamics in synthetic audiovisual speech , 2007, INTERSPEECH.

[16]  A. Johnston,et al.  Categorizing sex and identity from the biological motion of faces , 2001, Current Biology.

[17]  Carlos Busso,et al.  Novel Realizations of Speech-Driven Head Movements with Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Takaaki Kuratate,et al.  Linking facial animation, head motion and speech acoustics , 2002, J. Phonetics.

[19]  K. Grammer,et al.  Motion patterns in political speech and their influence on personality ratings , 2010 .

[20]  Lianhong Cai,et al.  Head and facial gestures synthesis using PAD model for an expressive talking avatar , 2014, Multimedia Tools and Applications.

[21]  Evelyn Z. McClave Linguistic functions of head movements in the context of speech , 2000 .

[22]  A. Murat Tekalp,et al.  Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[23]  Donald E. Knuth Two notes on notation , 1992 .

[24]  A. Campbell,et al.  Bodily communication and personality. , 1978, The British journal of social and clinical psychology.

[25]  Lei Xie,et al.  Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.