Telephone Speech Recognition via the Combination of Knowledge Sources in a Segmental Speech Model

The currently dominant speech recognition methodology, Hidden Markov Modeling, treats speech as a stochastic random process with very simple mathematical properties. The simplistic assumptions of the model, and especially that of the independence of the observation vectors have been criticized by many in the literature, and alternative solutions have been proposed. One such alternative is segmental modeling, and the OASIS recognizer we have been working on in the recent years belongs to this category. In this paper we go one step further and suggest that we should consider speech recognition as a knowledge source combination problem. We offer a generalized algorithmic framework for this approach and show that both hidden Markov and segmental modeling are a special case of this decoding scheme. In the second part of the paper we describe the current components of the OASIS system and evaluate its performance on a very difficult recognition task, the phonetically balanced sentences of the MTBA Hungarian Telephone Speech Database. Our results show that OASIS outperforms a traditional HMM system in phoneme classification and achieves practically the same recognition scores at the sentence level.

[1]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  George Anton Kiraz,et al.  Compressed Storage of Sparse Finite-State Transducers , 1999, WIA.

[3]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[4]  László Tóth,et al.  Kernel-based feature extraction with a speech technology application , 2004, IEEE Transactions on Signal Processing.

[5]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Peter Beyerlein,et al.  Discriminative model combination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  László Tóth,et al.  Application of Kernel-Based Feature Space Transformations and Learning Methods to Phoneme Classification , 2004, Applied Intelligence.

[8]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[9]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Tibor Fegyó,et al.  A Comparative Study on Hungarian Acoustic Model Sets and Training Methods , 2003 .

[11]  László Tóth,et al.  A Discriminative Segmental Speech Model and Its Application to Hungarian Number Recognition , 2000, TSD.

[12]  Gábor Gosztolya,et al.  Improving the Multi-stack Decoding Algorithm in a Segment-Based Speech Recognizer , 2003, IEA/AIE.

[13]  Gábor Gosztolya,et al.  Replicator Neural Networks for Outlier Modeling in Segmental Speech Recognition , 2004, ISNN.

[14]  Yifan Gong,et al.  Assessing the importance of the segmentation probability in segment-based speech recognition , 1998, Speech Commun..

[15]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  János Csirik,et al.  A Comparative Study of Several Feature Transformation and Learning Methods for Phoneme Classification , 2000, Int. J. Speech Technol..

[17]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[18]  Hynek Hermansky,et al.  Modulation Spectrum in Speech Processing , 1998 .

[19]  Yifan Gong,et al.  Stochastic trajectory modeling for speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Martin J. Russell,et al.  Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[21]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..