A comparison of trajectory and mixture modeling in segment-based word recognition

A mechanism for implementing mixtures at a phone-subsegment (microsegment) level for continuous word recognition based on the stochastic segment model (SMM) is presented. The issues that are involved in tradeoffs between the trajectory and mixture modeling in segment-based word recognition are investigated. Experimental results are reported on DAPRA's speaker-independent Resource management corpus. The results obtained suggest that there is a tradeoff in using mixture models and trajectory models, associated with the level of detail of the modeling unit. The results support the use of whole segment models in the context-dependent case, and microsegment-level (and possibly segment-level) mixtures rather than frame-level mixtures.<<ETX>>

[1]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Vassilios Digalakis,et al.  Segment-based stochastic models of spectral dynamics for continuous speech recognition , 1992 .

[3]  Hsiao-Wuen Hon,et al.  Allophone clustering for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mari Ostendorf,et al.  Continuous Word Recognition Based on the Stochastic Segment Model , 1992 .

[5]  Herbert Gish,et al.  Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.