Speech recognizer for preparing medical reports: Development experiences of a Hungarian speaker independent continuous speech recognizer

A development tool (MKBF 1.0) for constructing continuous speech recognizers has been created under Windows XP. The system is based on a statistical approach (HMM phoneme models, and bi-gram language models with non linear smoothing) and works in real time. The tool is able to construct a middle sized speech recognizer with a vocabulary of 1000-20000 words. New solutions have been developed for the acoustical pre-processing, for the statistical model building of phonemes, and in syntactic level. Through our examination, different training sets were used with different vocabularies. Hungarian is a strongly agglutinative language, in which the number of the word forms is very high. This is the reason why two forms of bi-gram linguistic model were constructed: one is the traditional word forms based and the other is the morpheme based model, in which the vocabulary is much smaller. In this article, test results and the experiences drawn from them are presented. Recognition accuracy has been considerably increased using perplexity based linguistic adaptation. Reviewed