Improving robustness of speech recognition performance to aggregate of noises by two-dimensional visualization

This paper proposes a new methodology to improve robustness of recognition performance to aggregate of noises by two-dimensional visualization technique. At first, an aggregate of noises existing in adverse environments are collected as much as possible. Then, hidden Markov model (HMM) for each collected noise is trained. Aggregate of the trained HMMs are visualized into two-dimensional map by the statistical multidimensional scaling technique named as COSMOS method. The noises corresponding to the HMMs located in periphery of the map are overlaid to clean speech used for training HMMs of acoustic models. It is revealed that this new methodology significantly reduces recognition error rate by around 60% to non-stationary noises overlaid in the voice interval of word.