Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system

This paper presents work in progress on the automatic detection of prosodic prominence in continuous speech. Prosodic prominence involves two different phonetic features: pitch accents, connected with fundamental frequency (F0) movements and syllable overall energy, and stress, which exhibits a strong correlation with syllable nuclei duration and mid-to-high-frequency emphasis. By measuring these acoustic parameters it is possible to build an automatic system capable of correctly identifying prominent syllables with an agreement, with human-tagged data, comparable with the inter-human agreement reported in the literature. This system does not require any training phase, additional information or annotation, it is not tailored to a specific set of data and can be easily adapted to different languages.

[1]  Barbertje Streefkerk,et al.  Acoustical correlates of prominence: A design for research , 1997 .

[2]  Paul C. Bagshaw,et al.  An investigation of acoustic events related to sentential stress and pitch accents, in English , 1993, Speech Commun..

[3]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[4]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[5]  Kenneth N. Stevens,et al.  Automatic syllable detection for vowel landmarks , 2000 .

[6]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[7]  Fergus McInnes,et al.  Use of acoustic sentence level and lexical stress in HSMM speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Paul Taylor Automatic recognition of intonation from F0 contours using the rise/fall/connection model , 1993, EUROSPEECH.

[9]  Rodolfo Delmonte,et al.  SLIM prosodic automatic tools for self-learning instruction , 2000, Speech Commun..

[10]  M. Beckman Stress And Non-Stress Accent , 1986 .

[11]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[12]  Michael S. Scordilis,et al.  Development and comparison of three syllable stress classifiers , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Jean Véronis,et al.  A multilingual prosodic database , 1998, ICSLP.

[14]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[15]  Paul Taylor,et al.  A Phonetic Model of English Intonation , 1992 .

[16]  Louis ten Bosch,et al.  Acoustical features as predictors for prominence in read aloud dutch sentences used in ANN's , 1999, EUROSPEECH.

[17]  Mitch Weintraub,et al.  Automatic text-independent pronunciation scoring of foreign language student speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Vincent J. van Heuven,et al.  Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[20]  Mary Beth Beckman,et al.  Tagging prosody and discourse structure in elicited spontaneous speech , 2000 .

[21]  Paul Taylor,et al.  The rise/fall/connection model of intonation , 1994, Speech Communication.

[22]  Fabio Tamburini,et al.  Automatic detection of prosodic prominence in continuous speech , 2002, LREC.