A unified way in incorporating segmental feature and segmental model into HMM

There are two major approaches to speech recognition: frame-based and segment-based approach. The frame-based approach, e.g. HMM, assumes a statistical independence and an identical distribution of the observation in each state. In addition it incorporates weak duration constraints. The segment-based approach is computational expensive and rough modelling easily occurs if not much 'templates' are stored. This paper presents a new framework to incorporate the segmental feature and the segmental model in a unified way into frame-based HMM to exploit the advantage of both methods. In the modified Viterbi algorithm, frame-based information prunes out the most probable path at each segment level to which the segmental model can be applied with dramatically reduced computational load; at the same time, the segmental score refines the score obtained by the frame-based model at each level. In this way, the best path found in the end, by the Viterbi algorithm, is optimal.

[1]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for speech analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Chin-Hui Lee,et al.  A frame-synchronous network search algorithm for connected word recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Chin-Hui Lee On the use of some robust modeling techniques for speech recognition , 1989 .

[5]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Chiu-yu Tseng,et al.  Isolated-utterance speech recognition using hidden Markov models with bounded state durations , 1991, IEEE Trans. Signal Process..

[7]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[8]  Oded Ghitza,et al.  Hidden Markov models with templates as non-stationary states: an application to speech recognition , 1993, Comput. Speech Lang..

[9]  George Zavaliagkos,et al.  Comparative Experiments on Large Vocabulary Speech Recognition , 1993, HLT.

[10]  Chafic Mokbel,et al.  On-line adaptation of a speech recognizer to variations in telephone line conditions , 1993, EUROSPEECH.

[11]  Jun He,et al.  Combining stochastic trajectory model and discriminative feature in speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.