Abstract To deal with large lexica (more than 2000 words) automatic speech recognition systems (ASR) use an internal phonetic representation of the speech signal and phonemic models of pronunciation from the lexicon to search for the spoken word chain or sentence. Therefore it is possible to model different pronunciations of a word in the lexicon. In German we observed that individual speakers pronounce words in a typical way that depends on several factors as sex, age, place of living, place of birth, etc. Our goal is to enhance speech recognition by automatically adapting the models of pronunciation in the lexicon to the unknown speaker. The obvious problem is: You cannot wait until the present speaker has uttered approximately 2000 different words at least once. We solved this problem by generalization of observed rules of differing pronunciation to words not yet observed. Another method presented in this paper is speaker adaptation by re-estimating the a posteriori probabilities of the phonetic units used in a “bottom up” ASR system. A word hypothesis is evaluated by the product of the a posteriori probabilities of the phonetic units produced by the classification to the phonetic units belonging to the word hypothesis. Normally these probabilities are estimated during the training of the ASR system and stay fixed during the test. We propose an algorithm which observes the typical confusions of phonetic units of the unknown speaker and adapts the a posteriori probabilities continuously.
[1]
W. Reichl.
Neuronale Netze zur Detektion von Silbenkernen
,
1992,
DAGM-Symposium.
[2]
Walter Weigel.
Silbenorientierte Erkennung fliessender Sprache mittels diskreter stochastischer Modellierung
,
1990
.
[3]
Thomas Becker,et al.
Maschinelle Generierung von Aussprachevarianten: Perspektiven für Sprachsyntheseund Spracherkennungssysteme
,
1989,
it Inf. Technol..
[4]
Florian Schiel.
Modifizierter A*-Algorithmus zur Erkennung fließend gesprochener Sätze
,
1991,
DAGM-Symposium.
[5]
Hans-Jürgen Geywitz.
Automatische Erkennung fliessender Sprache mit silbenorientierten Einheiten
,
1984
.
[6]
M. Jack,et al.
Hidden Markov modelling of speech based on a semicontinuous model
,
1988
.
[7]
G. Ruske,et al.
Recognition of demisyllable based units using semicontinuous hidden Markov models
,
1992,
[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.