论文信息 - Support vector machines for automatic data cleanup

Support vector machines for automatic data cleanup

Accurate training data plays a very important role in training effective acoustic models for speech recognition. In conversational speech, in several cases, the transcribed data has a significant word error rate which leads to bad acoustic models. In this paper we explore a method to automatically identify such mislabelled data in the context of a hybrid Support Vector Machine/hidden Markov model (HMM) system, thereby building accurate acoustic models. The effectiveness of this method is proven on both synthetic and real speech data. A hybrid system for OGI alphadigits using this methodology gives a significant improvement in performance over a comparable baseline HMM system.

Joseph Picone | Aravind Ganapathiraju | J. Picone | A. Ganapathiraju

[1] Hervé Bourlard,et al. Connectionist speech recognition , 1993 .

[2] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3] Steve Renals,et al. Confidence measures for hybrid HMM/ANN speech recognition , 1997, EUROSPEECH.

[4] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .

[5] Joseph Picone,et al. Support vector machines for speech recognition , 1998, ICSLP.

[6] Federico Girosi,et al. An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[7] Joseph Picone,et al. Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.