Prosody-dependent acoustic modeling using variable-parameter hidden markov models

As an effort to make prosody useful in spontaneous speech recognition, we adopt a quasi-continuous prosodic labels and accordingly design a prosody-dependent acoustic model to improve ASR performances. We propose a variable-parameter Hidden Markov Models, modeling the mean vector as a function of the prosody variable through a polynomial regression model. The prosodically-adapted acoustic models are used to re-score the N-best output by a standard ASR, according to the prosody variable assigned by an automatic prosody detector. Experiments on the Buckeye corpus demonstrate the effectiveness of our approach.

[1]  Jeung-Yoon Choi,et al.  Prosody dependent speech recognition on radio news corpus of American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Shigeki Sagayama,et al.  Multiple-regression hidden Markov model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Xiaodong Cui,et al.  A Study of Variable-Parameter Gaussian Mixture Hidden Markov Modeling for Noisy Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Yoonsook Mo,et al.  Duration and intensity as perceptual cues for naïve listeners' prominence and boundary perception , 2008 .

[7]  Shrikanth S. Narayanan,et al.  Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jennifer Cole,et al.  Naïve listeners' prominence and boundary perception , 2008, Speech Prosody 2008.

[9]  William D. Raymond,et al.  The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability , 2005, Speech Commun..

[10]  Lin-Shan Lee,et al.  Prosodic modeling in large vocabulary Mandarin speech recognition , 2006, INTERSPEECH.

[11]  Mark Hasegawa-Johnson,et al.  Prosodic effects on vowel production: evidence from formant structure , 2009, INTERSPEECH.

[12]  Stephanie Seneff,et al.  Lexical stress modeling for improved speech recognition of spontaneous telephone speech in the jupiter domain , 2001, INTERSPEECH.