Keyword Spotting Based on Phoneme Confusion Matrix

For many practical applications of keyword spotting, input signal is a spontaneous conversation while the acoustic model was trained with read speech because of data availability. Generally speaking, keyword spotting system will degrade significantly because of mismatch between acoustic model and spontaneous speech. To solve this problem, this paper presents a two-pass keyword spotting strategy. In order to improve the retrieval performance, an improved phoneme confusion matrix is adopted. It will give more freedom in the representation so as to alleviate the effect of mismatched training condition and of phoneme misrecognition. Furthermore, a hybrid confidence measure is applied to reject false alarms. Experiments show that the proposed algorithms significantly reduced equal error rate (EER) on the telephone conversational task.

[1]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Ananth Sankar,et al.  Utterance verification based on statistics of phone-level confidence scores , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Bo Xu,et al.  Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Timothy J. Hazen,et al.  Word and phone level acoustic confidence scoring , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Sherif Abdou,et al.  Beam search pruning in speech recognition using a posterior probability-based confidence measure , 2004, Speech Commun..

[6]  Gunnar Evermann,et al.  Minimum Word Error Rate Decoding , 1999 .

[7]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Pascale Fung,et al.  Modelling pronunciation variations in spontaneous Mandarin speech , 2000, INTERSPEECH.

[9]  Thomas Sikora,et al.  Phonetic confusion based document expansion for spoken document retrieval , 2004, INTERSPEECH.