Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence

Recent work has suggested that prominence perception could be driven by the predictability of the acoustic prosodic features of speech. On the other hand, lexical predictability and part of speech information are also known to correlate with prominence. In this paper, we investigate how the bottom-up acoustic and top-down lexical cues contribute to sentence prominence by using both types of features in unsupervised and supervised systems for automatic prominence detection. The study is conducted using a corpus of Dutch continuous speech with manually annotated prominence labels. Our results show that unpredictability of speech patterns is a consistent and important cue for prominence at both the lexical and acoustic levels, and also that lexical predictability and part-of-speech information can be used as efficient features in supervised prominence classifiers.

[1]  Ani Nenkova,et al.  To Memorize or to Predict: Prominence labeling in Conversational Speech , 2007, NAACL.

[2]  J. Terken Fundamental frequency and perceived prominence of accented syllables. , 1991, The Journal of the Acoustical Society of America.

[3]  Julia Hirschberg,et al.  Modeling Local Context for Pitch Accent Prediction , 2000, ACL.

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Shrikanth S. Narayanan,et al.  Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Taniya Mishra,et al.  Unsupervised prominence prediction for speech synthesis , 2013, INTERSPEECH.

[7]  Jmb Jacques Terken,et al.  The perception of prosodic prominence , 2000 .

[8]  Patrick Wambacq,et al.  The ESAT 2008 system for N-Best Dutch speech recognition benchmark , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  P. Lieberman Some Acoustic Correlates of Word Stress in American English , 1959 .

[10]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[11]  Camille Guinaudeau,et al.  Accounting for Prosodic Information to Improve ASR-Based Topic Tracking for TV Broadcast News , 2011, INTERSPEECH.

[12]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Hugo Van hamme,et al.  Use and Evaluation of Prosodic Annotations in Dutch , 2004, LREC.

[15]  Gareth J. F. Jones,et al.  Incorporating prosodic prominence evidence into term weights for spoken content retrieval , 2015, INTERSPEECH.

[16]  Shrikanth S. Narayanan,et al.  An Acoustic Measure for Word Prominence in Spontaneous Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Bhuvana Ramabhadran,et al.  Modeling phrasing and prominence using deep recurrent learning , 2015, INTERSPEECH.

[18]  Pier Marco Bertinetto,et al.  Prosodic prominence detection in Italian continuous speech using probabilistic graphical models , 2014 .

[19]  Stefanie Shattuck-Hufnagel,et al.  A prosody tutorial for investigators of auditory sentence processing , 1996, Journal of psycholinguistic research.

[20]  Michael C. Frank,et al.  Unsupervised word discovery from speech using automatic segmentation into syllable-like units , 2015, INTERSPEECH.

[21]  Mark Hasegawa-Johnson,et al.  Signal-based and expectation-based factors in the perception of prosodic prominence , 2010 .

[22]  Okko Johannes Räsänen,et al.  3PRO - An unsupervised method for the automatic detection of sentence prominence in speech , 2016, Speech Commun..

[23]  Shimei Pan,et al.  Word Informativeness and Automatic Pitch Accent Modeling , 1999, EMNLP.

[24]  George Christodoulides,et al.  An evaluation of machine learning methods for prominence detection in French , 2014, INTERSPEECH.

[25]  Dimitrios Dimitriadis,et al.  Spectral Moment Features Augmented by Low Order Cepstral Coefficients for Robust ASR , 2010, IEEE Signal Processing Letters.

[26]  Bhuvana Ramabhadran,et al.  Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data , 2010, INTERSPEECH.

[27]  Julia Hirschberg,et al.  Learning prosodic features using a tree representation , 2001, INTERSPEECH.

[28]  Petra Wagner,et al.  Different parts of the same elephant: A roadmap to disentangle and connect different perspectives on prosodic prominence , 2015, ICPhS.

[29]  Okko Johannes Räsänen,et al.  Statistical Unpredictability of F0 Trajectories as a Cue to Sentence Stress , 2014, CogSci.

[30]  Hongbing Hu,et al.  A spectral/temporal method for robust fundamental frequency tracking. , 2008, The Journal of the Acoustical Society of America.

[31]  Carlo Caini,et al.  An Automatic System for Detecting Prosodic Prominence in American English Continuous Speech , 2005, Int. J. Speech Technol..

[32]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[33]  B. Rosner,et al.  Loudness predicts prominence: fundamental frequency lends little. , 2005, The Journal of the Acoustical Society of America.

[34]  Taniya Mishra,et al.  Word Prominence Detection using Robust yet Simple Prosodic Features , 2012, INTERSPEECH.

[35]  Daniel Jurafsky,et al.  A Probabilistic Model of Lexical and Syntactic Access and Disambiguation , 1996, Cogn. Sci..

[36]  Okko Johannes Räsänen,et al.  Automatic detection of sentence prominence in speech using predictability of word-level acoustic features , 2015, INTERSPEECH.

[37]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.