Isolated word intonation recognition using hidden Markov models

A method is described for recognition of intonation patterns based on discrete distribution hidden Markov models (HMMs) and vector quantization techniques. Fundamental frequency and energy features, were used to determine the best combination of feature processing and quantization techniques for recognition of statement, question, command, calling, and continuation intonation patterns in isolated words. A recognition accuracy of 89% was achieved for the best-case speaker- and word-independent performance. Recognition performance of human listeners on a 100-word subset yielded 77% accuracy, compared to 83% using HMMs on the same subset.<<ETX>>

[1]  Mari Ostendorf,et al.  Prosody and Parsing , 1989, HLT.

[2]  Ying Sun,et al.  A hidden Markov model applied to Chinese four-tone recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Frank Fallside,et al.  Lexical stress recognition using hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Dieter Huber,et al.  A statistical approach to the segmentation and broad classification of continuous speech into phrase-sized information units , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[6]  Andrej Ljolje,et al.  Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[8]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[9]  Mari Ostendorf,et al.  Joint quantizer design and parameter estimation for discrete hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  J. Pierrehumbert,et al.  Intonational structure in Japanese and English , 1986, Phonology.

[11]  Alex Waibel,et al.  Prosody and speech recognition , 1988 .

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .