A Novel Confidence Measure Based on Context Consistency for Spoken Term Detection

In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in word lattice. When calculating the context consistency of a hypothesized word, the proposed confidence measure considers not only the semantic similarity between words but also the uncertainty of the context. To measure the uncertainty of the context, we employ the word occurrence probability, which is obtained by combining the overlapping hypotheses in word posterior lattice. Additionally, we also use two effective measures of semantic similarity to acquire more accurate context consistency for confidence measure. The experiments conducted on the Hub-4NE Mandarin database show that the proposed confidence measure can achieve improvements over the confidence measure which ignores the word occurrence probability of context word.

[1]  Martha Larson,et al.  Contextual verification for open vocabulary spoken term detection , 2010, INTERSPEECH.

[2]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[3]  Satoshi Takahashi,et al.  Spoken Document Confidence Estimation Using Contextual Coherence , 2011, INTERSPEECH.

[4]  Dong Wang,et al.  Handling overlaps in spoken term detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Peng Yu,et al.  Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[6]  Timothy J. Hazen,et al.  Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[7]  Ren-Hua Wang,et al.  A comparative study on various confidence measures in large vocabulary speech recognition , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[8]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[9]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[10]  Stephen Cox,et al.  High-level approaches to confidence estimation in speech recognition , 2002, IEEE Trans. Speech Audio Process..

[11]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[12]  Yu Shi,et al.  Segmental tonal modeling for phone set design in Mandarin LVCSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[14]  Lin-Shan Lee,et al.  Improved spoken term detection using support vector machines based on lattice context consistency , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Wanxiang Che,et al.  LTP: A Chinese Language Technology Platform , 2010, COLING.

[16]  Song Han,et al.  Automatic Identification of Chinese Stop Words , 2006 .