PromDrum - Exploiting the prosody-gesture link for intuitive, fast and fine-grained prominence annotation

Most prominence annotation methods have certain drawbacks. Simple binary scales may be too coarse to capture fine-grained prominence differences, and multi-level annotation schemes have been shown to be time-consuming and difficult to use for non-expert annotators. This study proposes a novel method for fine-grained and fast prominence annotation by exploiting the prosody-gesture link. On a sentence-by-sentence basis, native German participants were instructed to listen to audio recordings and reiterate them by beating on an electronic drum pad either once per syllable (experiment 1) or once per word (experiment 2), modulating the strength of each beat according to how strongly the syllable or word stood out in the sentence. The velocity profiles of MIDI outputs were then interpreted as correlates of perceived prominence and compared with fine-grained prominence ratings by three expert annotators. While wordlevel drumming showed high correlations to conventional ratings for some of the subjects, inexperienced participants often had considerable difficulty performing the task. Syllable-level drumming, on the other hand, proved to be a time-efficient and intuitive method for experienced and naive subjects alike. Especially by pooling velocity results from several participants to create mean values, it was possible to maintain high levels of correlation with expert prominence ratings.

[1]  Denis Arnold,et al.  More on the Normalization of Syllable Prominence Ratings , 2012, INTERSPEECH.

[2]  Anne Lacheret,et al.  Prominence perception and accent detection in French: from phonetic processing to grammatical analysis , 2013 .

[3]  Dani Byrd,et al.  Spatiotemporal coupling between speech and manual motor actions , 2014, J. Phonetics.

[4]  References , 1971 .

[5]  D. Loehr,et al.  Temporal, structural, and pragmatic synchrony between intonation and gesture , 2012 .

[6]  Petra Wagner,et al.  Different parts of the same elephant: A roadmap to disentangle and connect different perspectives on prosodic prominence , 2015, ICPhS.

[7]  Shrikanth S. Narayanan,et al.  An Acoustic Measure for Word Prominence in Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Núria Esteve-Gibert,et al.  Infants temporally coordinate gesture-speech combinations before they produce their first words , 2014, Speech Commun..

[9]  Stefan Kopp,et al.  Gesture and speech in interaction: An overview , 2014, Speech Commun..

[10]  Fred Cummins,et al.  The temporal relation between beat gestures and speech , 2011 .

[11]  G. Fant,et al.  Speech , Music and Hearing Quarterly Progress and Status Report Preliminaries to the study of Swedish prose reading and reading style , 2007 .

[12]  Norma C Mendoza-Denton,et al.  Semiotic Layering through Gesture and Intonation , 2011 .

[13]  Colin W. Wightman Perception of multiple levels of prominence in spontaneous speech , 1993 .

[14]  Mark Hasegawa-Johnson,et al.  Signal-based and expectation-based factors in the perception of prosodic prominence , 2010 .

[15]  Petra Wagner,et al.  Comparing Word and Syllable Prominence Rated by Naïve Listeners , 2011, INTERSPEECH.

[16]  Petra Wagner,et al.  Evaluating Different Rating Scales for Obtaining Judgments of Syllable Prominence from Naïve Listeners , 2011, ICPhS.

[17]  Anders Eriksson,et al.  Syllable prominence: a matter of vocal effort, phonetic distinct-ness and top-down processing , 2001, INTERSPEECH.

[18]  Christian Jensen,et al.  Choosing a scale for measuring perceived prominence , 2005, INTERSPEECH.

[19]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[20]  Petra Wagner,et al.  DIMA - Annotation guidelines for German intonation , 2015, ICPhS.