论文信息 - Using prosody for the improvement of ASR - sentence modality recognition

Using prosody for the improvement of ASR - sentence modality recognition

In the Laboratory of Speech Acoustics ASR research has been prepared, in which we were searching for the possibility to contribute to the higher linguistic processing levels of ASR – at syntactic, and semantic level – by acoustical preprocessing of the supra-segmental (prosodic) features. The subject of our current article is a semantic level processing, built on supra-segmental parameters. HMM models of modality types of sentences were built by training the recognizer with speech databases processed according to the types of modality, and a simple set of connection rules of modalities were used as linguistic model. The best recognition results were obtained, when state numbers of HMM clause type-models were 11, and each state had 2 Gaussian components. With these adjustments the accuracy of recognized types of modalities was 71 % for Hungarian, and 78% for German, even though the database was small for both languages.

György Szaszák | Klára Vicsi

[1] Elmar Nöth,et al. Prosodic scoring of word hypotheses graphs , 1995, EUROSPEECH.

[2] Elmar Nöth,et al. Integrated recognition of words and prosodic phrase boundaries , 2002, Speech Commun..

[3] Claudio Becchetti,et al. Speech Recognition: Theory and C++ Implementation , 1999 .

[4] Sadaoki Furui,et al. An Overview of Speaker Recognition Technology , 1996 .

[5] N. M. Veilleuz,et al. Prosody/Parse Scoring and Its Application in ATIS , 1993, HLT.

[6] György Szaszák,et al. Using Prosody in Fixed Stress Languages for Improvement of Speech Recognition , 2007, COST 2102 Workshop.

[7] C. G. Rice. Mechanisms of speech recognition , 1976 .

[8] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.