Improvement of Speech Recognition Method Using Speech Production Mechanism

This study attempts to combine human mechanisms of speech production into automatic speech recognition (ASR) approaches by using articulatory movements as a constraint. A primary experiment was first conducted on a set of articulatory data, where the articulatory data were treated in the HMM in the same way as the acoustic data. Recognition accuracy increased after adding the articulatory data to the HMM directly. It indicated that the articulatory data have some additional information that is benefit to ASR. We then combined the articulatory data as a hidden parameter in the ASR system built on a hybrid HMM/BN model [1]. Experiments were conducted using this model in monophone recognition with individual models for each speaker and with a uniform model for all speakers, respectively. The accuracy obtained from the HMM/BN was higher than that from the standard HMM without the articulatory data. This study showed a way to incorporate the speech production mechanism in ASR system.