Using multi-level segmentation coefficients to improve HMM speech recognition

This paper presents a new kind of acoustic features for HMM speech recognition. These features try to capture phone-specific segmentation information using multiple temporal resolutions. Experiments show that word accuracy can be improved by 7% when combining these features with traditional mel-cepstral coefficients in a speaker-independent word recogniser. This improvement is mostly due to a reduced number of insertion and deletion errors.

[1]  Jeffrey N. Marcus,et al.  Phonetic recognition in a segment-based HMM , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  M.J. Ready,et al.  Transform representation of the spectra of acoustic speech segments with applications. I. General approach and application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[3]  Kai Hübener,et al.  Controlling search in segmentation lattices of speech signals , 1993, EUROSPEECH.

[4]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[5]  Claude Montacié,et al.  HMM Based Acoustic-Phonetic Decoding with Constrained Transitions and Speaker Topology , 1995 .

[6]  Mary P. Harper,et al.  Using explicit segmentation to improve HMM phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[8]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Mark J. F. Gales,et al.  Improving environmental robustness in large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Steve J. Young,et al.  The HTK tied-state continuous speech recogniser , 1993, EUROSPEECH.