An information-theoretic framework for automated discovery of prosodic cues to conversational structure

Interaction timing in conversation exhibits myriad variabilities, yet it is patently not random. However, identifying consistencies is a manually labor-intensive effort, and findings have been limited. We propose a conditonal mutual information measure of the influence of prosodic features, which can be computed for any conversation at any instant, with only a speech/non-speech segmentation as its requirement. We evaluate the methodology on two segmental features: energy and speaking rate. Results indicate that energy, the less controversial of the two, is in fact better on average at predicting conversational structure. We also explore the temporal evolution of model “surprise”, which permits identifying instants where each feature's influence is operative. The method corroborates earlier findings, and appears capable of large-scale data-driven discovery in future research.

[1]  Julia Hirschberg,et al.  Turn-taking cues in task-oriented dialogue , 2011, Comput. Speech Lang..

[2]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[3]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[4]  Maxine Eskénazi,et al.  Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System , 2008, SIGDIAL Workshop.

[5]  Kornel Laskowski,et al.  Exploiting loudness dynamics in stochastic models of turn-taking , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[6]  Joseph Picone,et al.  Resegmentation of SWITCHBOARD , 1998, ICSLP.

[7]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[8]  J. Local,et al.  Towards a phonology of conversation: turn-taking in Tyneside English , 1986, Journal of Linguistics.

[9]  Kornel Laskowski,et al.  Measuring Final Lengthening for Speaker-Change Prediction , 2011, INTERSPEECH.

[10]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Andreas Stolcke,et al.  A prosody-based approach to end-of-utterance detection that does not require speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  David Schlangen,et al.  From reaction to prediction: experiments with computational models of turn-taking , 2006, INTERSPEECH.

[14]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 2015 .

[15]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[16]  Kornel Laskowski,et al.  Corpus-independent history compression for stochastic turn-taking models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).