Modelling Non-verbal Sounds for Speech Recognition

When speech understanding systems are used in real applications, they encounter incidental noise generated by the speaker and the environment. Such noises can cause serious problems for speech recognizers not designed to cope with them. We attempt to model these noises by training HMM "noise words" to match classes of noises. The noise words were incorporated into the Sphinx system and performance compared to the system without noise words. Initial results suggest that the technique does increase system performance significantly.