Fast subword-based approach for open vocabulary spoken term detection

This paper describes an efficient two-stage approach using sub-phonetic segment N-gram index and shift continuous dynamic programming for open vocabulary spoken term detection. With this two-stage search, we attempt to improve performance in both retrieval accuracy and process time. In the speech recognition process, a more sophisticated subword that is shorter than phonemes is used to minimize the effect of recognition error. Then, in the indexing and search process, N-gram and block addressing techniques are adopted to improve the search speed. In addition, in order to reduce missed errors in indexing, the N-best hypotheses are directly added to the inverted index. We investigate the properties of each method and examine their usefulness for the open vocabulary spoken term detection task.

[1]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[2]  Kazuyo Tanaka,et al.  Automatic labeling and digesting for lecture speech utilizing repeated speech by shift CDP , 2001, INTERSPEECH.

[3]  Yoshiaki Itoh,et al.  Speech data retrieval system constructed on a universal phonetic code domain , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4]  Udi Manber,et al.  GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.

[5]  Shi-wook Lee,et al.  Combining multiple subword representations for open-vocabulary spoken document retrieval , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[7]  Alexander G. Hauptmann,et al.  Experiments in Spoken Document Retrieval at CMU , 1997, TREC.