Detecting Pitch Accent Using Pitch-corrected Energy-based Predictors

Previous work has shown that the energy components of frequency subbands with a variety of frequencies and bandwidths predict pitch accent with various degrees of accuracy, and produce correct predictions for distinct subsets of data points. In this paper, we describe a series of experiments exploring techniques to leverage the predictive power of these energy components by including pitch and duration features – other known correlates to pitch accent. We perform these experiments on Standard American English read, spontaneous and broadcast news speech, each corpus containing at least four speakers. Using an approach by which we correct energy-based predictions using pitch and duration information prior to using a majority voting classifier, we were able to detect pitch accent in read, spontaneous and broadcast news speech at 84.0%, 88.3% and 88.5% accuracy, respectively. Human performance at pitch accent detection is generally taken to be between 85% and 90%. Index Terms: prosodic analysis, spectral emphasis

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Mark Hasegawa-Johnson,et al.  A Maximum Likelihood Prosody Recognizer , 2004 .

[3]  Mattias Heldner,et al.  Spectral emphasis as an additional source of information in accent detection , 2001 .

[4]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[5]  Vincent J. van Heuven,et al.  Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Barbara Peskin,et al.  TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[7]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Alex Waibel,et al.  Prosody and speech recognition , 1988 .

[11]  Rodolfo Delmonte,et al.  SLIM prosodic automatic tools for self-learning instruction , 2000, Speech Commun..

[12]  V. V. van Heuven,et al.  Spectral balance as a cue in the perception of linguistic stress. , 1997, The Journal of the Acoustical Society of America.

[13]  Johan Liljencrants,et al.  Acoustic-phonetic Analysis of Prominence in Swedish , 2000 .

[14]  Julia Hirschberg,et al.  Discourse Structure in Spoken Language: Studies on Speech Corpora , 1995 .

[15]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[16]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[17]  Ralf Kompe,et al.  Prosody in Speech Understanding Systems , 1997, Lecture Notes in Computer Science.

[18]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  Xuejing Sun,et al.  Pitch accent prediction using ensemble machine learning , 2002, INTERSPEECH.

[21]  Giuseppe Riccardi,et al.  Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events , 1999, EUROSPEECH.

[22]  Julia Hirschberg,et al.  On the correlation between energy and pitch accent in read English speech , 2006, INTERSPEECH.

[23]  Mattias Heldner,et al.  A focus detector using overall intensity and high frequency emphasis , 1999 .

[24]  Shrikanth S. Narayanan,et al.  An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Fabio Tamburini,et al.  Automatic prominence identification and prosodic typology , 2005, INTERSPEECH.

[26]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[27]  Mari Ostendorf,et al.  The use of prosody in syntactic disambiguation , 1991 .