The importance of segmentation probability in segment based speech recognizers

In segment based recognizers, variable length speech segments are mapped to the basic speech units (phones, diphones, ...). We address the acoustical modeling of these basic units in the framework of segmental posterior distribution models (SPDM). The joint posterior probability of a unit sequence u_ and a segmentation s_, Pr(u_,s_|X_) can be written as the product of the segmentation probability Pr(s_|X_) and the unit classification probability Pr(u_|s_,X_), where X_ is the sequence of acoustic observation parameter vectors. In particular, we point out the role of the segmentation probability and demonstrate that it does improve the recognition accuracy. We present evidence for this in two different tasks (speaker dependent continuous word recognition in French and speaker independent phone recognition in American English) in combination with two different unit classification models.

[1]  Paul Dalsgaard,et al.  Segment based variable frame rate speech analysis and recognition using a spectral variation function , 1992, ICSLP.

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  Victor Zue,et al.  Detection and classification of phonemes using context-independent error back-propagation , 1990, ICSLP.

[4]  Steve Austin,et al.  Speech recognition using segmental neural nets , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[7]  James Glass,et al.  Acoustic segmentation and phonetic classification in the SUMMIT system , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Michael Witbrock,et al.  A connectionist approach to continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  S. Roucos,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yifan Gong,et al.  Stochastic trajectory modeling and sentence searching for continuous speech recognition , 1997, IEEE Trans. Speech Audio Process..