Reliable prominence identification in English spontaneous speech

This paper presents a follow up of a study on the automatic detection of prosodic prominence in spontaneous speech. Prosodic prominence involves two different prosodic features, pitch accent and stress, that are typically based on four acoustic parameters: fundamental frequency (F0) movements, overall syllable energy, syllable nuclei duration and mid-tohigh-frequency emphasis. A careful measurement of these acoustic parameters makes it possible to build an automatic system capable of identifying prominent syllables in utterances with performance comparable with the inter-human agreement reported in the literature even when tested on spontaneous speech.

[1]  M. Beckman Stress And Non-Stress Accent , 1986 .

[2]  Vincent J. van Heuven,et al.  Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Mattias Heldner,et al.  Spectral emphasis as an additional source of information in accent detection , 2001 .

[4]  Carlo Caini,et al.  An Automatic System for Detecting Prosodic Prominence in American English Continuous Speech , 2005, Int. J. Speech Technol..

[5]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[6]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[7]  Michael S. Scordilis,et al.  Development and comparison of three syllable stress classifiers , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[9]  Louis ten Bosch,et al.  Acoustical features as predictors for prominence in read aloud dutch sentences used in ANN's , 1999, EUROSPEECH.

[10]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[11]  Carlo Caini,et al.  Automatic Annotation of Speech Corpora for Prosodic Prominence , 2004 .

[12]  James Glass,et al.  Multi-level acoustic segmentation of continuous speech , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Rosaria Silipo,et al.  AUTOMATIC TRANSCRIPTION OF PROSODIC STRESS FOR SPONTANEOUS ENGLISH DISCOURSE , 1999 .

[14]  J. Terken Fundamental frequency and perceived prominence of accented syllables , 1989 .

[15]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[16]  Fabio Tamburini,et al.  Automatic prominence identification and prosodic typology , 2005, INTERSPEECH.

[17]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[18]  Lou Boves,et al.  Acoustic characteristics of lexical stress in continuous telephone speech , 1999, Speech Commun..

[19]  Marilyn May Vihman,et al.  Prosodic Phonology: The Theory and its Application to Language Acquisition and Speech Processing , 1990 .