Automatic detection of prosodic prominence by means of acoustic analyses

Prosodic prominence is commonly regarded as the perceptual salience of a linguistic unit relative to its environment. However, we are far from having a consensus on how it is measured subjectively and how it relates to objectively measurable acoustic events or linguistic structures such as lexical stress, prosodic focus, etc. Here we will concentrate mainly on the identification of prominence by means of acoustic parameters and automatic techniques. Considering this topic, some questions are still open in the community: (a) How can we reliably define and portray prosodic prominence? (b) What is the best prominence domain in acoustics? (c) Is prominence a continuous or a discrete phenomenon? (d) What are the acoustic parameters that support it and how can we combine them to reliably identify prominence? (e) To what extent are acoustic parameters language specific? Can we identify universals across languages? (f) What is the best paradigm for the automatic identification of prominence: Rule-Based or Machine Learning Systems? (g) How can we evaluate automatic systems in the right way? This contribution will briefly address these points.

[1]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[2]  Rosaria Silipo,et al.  AUTOMATIC TRANSCRIPTION OF PROSODIC STRESS FOR SPONTANEOUS ENGLISH DISCOURSE , 1999 .

[3]  A. Lotto,et al.  Speech perception as categorization , 2010, Attention, perception & psychophysics.

[4]  Petra Wagner,et al.  Objective, Subjective and Linguistic Roads to Perceptual Prominence - How are they compared and why? , 2012, INTERSPEECH.

[5]  P. Mertens,et al.  Local prominence of acoustic and psychoacoustic functions and perceived stress in French , 1991 .

[6]  Klaus J. Kohler,et al.  What is emphasis and how is it coded? , 2006, Speech Prosody 2006.

[7]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[8]  Paul Taylor,et al.  A Phonetic Model of English Intonation , 1992 .

[9]  R. Smits,et al.  Acquiring auditory and phonetic categories , 2005 .

[10]  Daniel Jurafsky,et al.  The detection of emphatic words using acoustic and lexical features , 2005, INTERSPEECH.

[11]  Petra Wagner Great expectations - introspective vs. perceptual prominence ratings and their acoustic correlates , 2005, INTERSPEECH.

[12]  Antonio Origlia,et al.  A dynamic tonal perception model for optimal pitch stylization , 2013, Comput. Speech Lang..

[13]  Fabio Tamburini,et al.  Prominenza frasale e tipologia prosodica : un approccio acustico , 2009 .

[14]  Petra Wagner,et al.  Evaluating Different Rating Scales for Obtaining Judgments of Syllable Prominence from Naïve Listeners , 2011, ICPhS.

[15]  Ivan Kopecek,et al.  Speech Recognition and Syllable Segments , 1999, TSD.

[16]  Richard M. Schwartz,et al.  Duration modeling in large vocabulary speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Jo Verhoeven,et al.  Influence of adjacent pitch accents on each other's perceived prominence: two contradictory effects , 1994 .

[18]  V. V. van Heuven,et al.  Spectral balance as a cue in the perception of linguistic stress. , 1997, The Journal of the Acoustical Society of America.

[19]  Antonio Origlia,et al.  Investigating syllabic prominence with Conditional Random Fields and Latent-Dynamic Conditional Random Fields , 2012, INTERSPEECH.

[20]  Pier Marco Bertinetto,et al.  Strutture prosodiche dell'italiano : accento, quantità, sillaba, giuntura, fondamenti metrici , 1981 .

[21]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[22]  Shuang Zhang,et al.  Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection , 2011, INTERSPEECH.

[23]  Lou Boves,et al.  Acoustic characteristics of lexical stress in continuous telephone speech , 1999, Speech Commun..

[24]  Petra Wagner,et al.  On automatic prominence detection for German , 2007, INTERSPEECH.

[25]  Fabio Tamburini,et al.  Automatic prominence identification and prosodic typology , 2005, INTERSPEECH.

[26]  Antoni Bertrán Prosodic Typology: on the Dychotomy between Stress. Timed and Syllable-Timed Languages , 1999 .

[27]  Pier Marco Bertinetto,et al.  Prosodic prominence detection in Italian continuous speech using probabilistic graphical models , 2014 .

[28]  Susanne Burger,et al.  Syllable detection in read and spontaneous speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Ursula Hess,et al.  Handbook of Categorization in Cognitive Science , 2017 .

[30]  Petra Wagner Vorhersage und Wahrnehmung deutscher Betonungsmuster , 2002 .

[31]  Antonio Origlia,et al.  On the Use of the Rhythmogram for Automatic Syllabic Prominence Detection , 2011, INTERSPEECH.

[32]  M. Beckman Stress And Non-Stress Accent , 1986 .

[33]  Anne-Catherine Simon,et al.  Prominence perception and accent detection in French. A corpus-based account , 2010 .

[34]  Bogdan Ludusan,et al.  Pitch behavior detection for automatic prominence recognition , 2010 .

[35]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[36]  Michael S. Scordilis,et al.  Development and comparison of three syllable stress classifiers , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[37]  Vincent J. van Heuven,et al.  Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[38]  J. Sawusch,et al.  The processing of duration and intensity cues to prominence. , 1996, The Journal of the Acoustical Society of America.

[39]  A. Noetzel Robust Syllable Segmentation Of Continuous Speech Using Neural Networks , 1991, Electro International, 1991.

[40]  Martti Vainio,et al.  Continuous wavelet transform for analysis of speech prosody , 2013 .

[41]  Anne-Catherine Simon,et al.  A Continuous Prominence Score Based On Acoustic Features , 2012, INTERSPEECH.

[42]  Carlo Caini,et al.  An Automatic System for Detecting Prosodic Prominence in American English Continuous Speech , 2005, Int. J. Speech Technol..

[43]  Fabio Tamburini,et al.  Reliable prominence identification in English spontaneous speech , 2006, Speech Prosody 2006.

[44]  Hongbing Hu,et al.  A spectral/temporal method for robust fundamental frequency tracking. , 2008, The Journal of the Acoustical Society of America.

[45]  JENNIFER FITZPATRICK On intonational typology , 2000 .

[46]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[47]  Louis C. W. Pols,et al.  Prominent accent and pitch movements , 1996 .

[48]  Steven Greenberg,et al.  Integrating syllable boundary information into speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49]  Mattias Heldner,et al.  On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in Swedish , 2003, J. Phonetics.

[50]  Petra Wagner,et al.  Automatic prominence annotation of a German speech synthesis corpus: towards prominence-based prosody generation for unit selection synthesis , 2010, SSW.

[51]  Fabio Tamburini,et al.  Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system , 2003, INTERSPEECH.

[52]  F. Ramus,et al.  Correlates of linguistic rhythm in the speech signal , 1999, Cognition.

[53]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[54]  Low Ee Ling,et al.  Q uantitative Characterizations of Speech Rhythm: Syllable-Timing in Singapore English , 2000, Language and speech.

[55]  Takayuki Arai,et al.  Japanese Mora-Timing: A Review , 2000, Phonetica.

[56]  Petra Wagner,et al.  Using generalized additive models and random forests to model prosodic prominence in German , 2013, INTERSPEECH.

[57]  Christian Jensen,et al.  Choosing a scale for measuring perceived prominence , 2005, INTERSPEECH.

[58]  Johan Liljencrants,et al.  Acoustic-phonetic Analysis of Prominence in Swedish , 2000 .

[59]  Antonio Origlia,et al.  Continuous emotion recognition with phonetic syllables , 2014, Speech Commun..

[60]  D. Bolinger A Theory of Pitch Accent in English , 1958 .

[61]  Petra Wagner,et al.  Obtaining prominence judgments from naïve listeners - Influence of rating scales, linguistic levels and normalisation , 2012, INTERSPEECH.

[62]  Taniya Mishra,et al.  Unsupervised prominence prediction for speech synthesis , 2013, INTERSPEECH.

[63]  Arturo Camacho Lozano,et al.  SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music , 2011 .