Support vector machines for automatic data cleanup

Accurate training data plays a very important role in training effective acoustic models for speech recognition. In conversational speech, in several cases, the transcribed data has a significant word error rate which leads to bad acoustic models. In this paper we explore a method to automatically identify such mislabelled data in the context of a hybrid Support Vector Machine/hidden Markov model (HMM) system, thereby building accurate acoustic models. The effectiveness of this method is proven on both synthetic and real speech data. A hybrid system for OGI alphadigits using this methodology gives a significant improvement in performance over a comparable baseline HMM system.

[1]  Hervé Bourlard,et al.  Connectionist speech recognition , 1993 .

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Steve Renals,et al.  Confidence measures for hybrid HMM/ANN speech recognition , 1997, EUROSPEECH.

[4]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[5]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[6]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[7]  Joseph Picone,et al.  Hybrid SVM/HMM architectures for speech recognition , 2000, INTERSPEECH.