Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling

Several studies use idealized, fluent utterances to comprehend spoken language. Disfluencies are often regarded to be just a noise in the speech flow. Other works argue that fragmented structures disfluencies, silent and filled pauses are important and can help better understanding. By extending the original concept of speech disfluency, the current paper involves the acoustic level and places the discontinuity of F0 in parallel with speech disfluencies. An exhaustive analysis of the advantages and disadvantages of using a continuous F0 estimate in prosodic event detection tasks is performed for formal and informal speaking styles. Results suggest that unlike in read formal speech, using a continuous, overall interpolated F0 curve is counterproductive in spontaneous informal speech. Comparing the behaviour of speech disfluencies and the effect of discontinuity of the F0 contour, results raise more general modelling philosophy considerations, as they suggest that disfluencies in informal speech may be by themselves informative entities, reflected also in the acoustic level organization of speech, which suggests that disfluencies in general are an important perceptual cue in human speech understanding.

[1]  Brigitte Zellner,et al.  Pauses and the temporal structure of speech , 1995 .

[2]  N. M. Veilleuz,et al.  Prosody/Parse Scoring and Its Application in ATIS , 1993, HLT.

[3]  M. Swerts Filled pauses as markers of discourse structure , 1998 .

[4]  Elmar Nöth,et al.  Integrated recognition of words and prosodic phrase boundaries , 2002, Speech Commun..

[5]  András Beke,et al.  Exploiting Prosody for Automatic Syntactic Phrase Boundary Detection in Speech , 2012 .

[6]  Sanjeev Khudanpur,et al.  A pitch extraction algorithm tuned for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  M. Cook,et al.  The Interpretation of Pauses by the Listener , 1970 .

[8]  Mari Ostendorf,et al.  Prosodic and lexical indications of discourse structure in human-machine interactions , 1997, Speech Commun..

[9]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[10]  András Beke,et al.  Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language , 2014, TSD.

[11]  András Beke,et al.  Exploiting Prosody for Syntactic Analysis in Automatic Speech Understanding , 2012, J. Lang. Model..

[12]  András Beke,et al.  Unsupervised Clustering of Prosodic Patterns in Spontaneous Speech , 2012, TSD.

[13]  A. D. Dominicis,et al.  Intonation Systems: A Survey of Twenty Languages , 1999 .

[14]  Elisabeth Selkirk The Syntax‐Phonology Interface , 2011 .

[15]  Einar Meister,et al.  BABEL: an Eastern European multi-language database , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Robbert-Jan Beun,et al.  Filled pauses as markers of discourse structure , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.