Automatically Generated Models for Unknown Words

Especially in recognition of spontaneous speech it is necessary to cope with the occurrence of unknown words. We present an approach to unknown word detection which is integrated into a standard HMM speech recognizer. From the context dependent sub-word units, e.g. triphones, that can be found in the training database a generic word model can be derived automatically using the context restrictions to form valid sequences of sub-word units. This generic word model combines automatically derived knowledge about the phonotactics of the language considered with the modelling quality of context dependent acoustic units. Detection of unknown words is achieved adding this model to the recognizer’s lexicon. We present results of experiments carried out on a large German spontaneous speech recognition task.