Lattice segmentation and support vector machines for large vocabulary continuous speech recognition

Lattice segmentation procedures are used to spot possible recognition errors in first-pass recognition hypotheses produced by a large vocabulary continuous speech recognition system. This approach is analyzed in terms of its ability to reliably identify, and provide good alternatives for, incorrectly hypothesized words. A procedure is described to train and apply support vector machines to strengthen the first pass system where it was found to be weak, resulting in small but statistically significant recognition improvements on a large test set of conversational speech.

[1]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[2]  Simon King,et al.  Framewise phone classification using support vector machines , 2002, INTERSPEECH.

[3]  J. Picone,et al.  ADVANCES IN HYBRID SVM / HMM SPEECH RECOGNITION 1 , 2002 .

[4]  William J. Byrne,et al.  Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition , 2006, Speech Commun..

[5]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[6]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[7]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[8]  Steven E. Golowich,et al.  A Support Vector/Hidden Markov Model Approach to Phoneme Recognition , 1998 .

[9]  Gert Cauwenberghs,et al.  Forward Decoding Kernel Machines: A Hybrid HMM/SVM Approach to Sequence Recognition , 2002, SVM.

[10]  Shantanu Chakrabartty,et al.  Support vector machines for segmental minimum Bayes risk decoding of continuous speech , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[11]  Mark J. F. Gales,et al.  Using SVMS and discriminative models for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Vaibhava Goel,et al.  Segmental minimum Bayes-risk decoding for automatic speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[13]  Shantanu Chakrabartty,et al.  Ginisupport vector machines for segmental minimum Bayes risk decoding of continuous speech , 2007, Comput. Speech Lang..

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  William J. Byrne,et al.  Discriminative training for segmental minimum Bayes risk decoding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Jonathan G. Fiscus,et al.  Tools for the analysis of benchmark speech recognition tests , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[19]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.