Fast decoding for open vocabulary spoken term detection

Information retrieval and spoken-term detection from audio such as broadcast news, telephone conversations, conference calls, and meetings are of great interest to the academic, government, and business communities. Motivated by the requirement for high-quality indexes, this study explores the effect of using both word and sub-word information to find in-vocabulary and OOV query terms. It also explores the trade-off between search accuracy and the speed of audio transcription. We present a novel, vocabulary independent, hybrid LVCSR approach to audio indexing and search and show that using phonetic confusions derived from posterior probabilities estimated by a neural network in the retrieval of OOV queries can help in reducing misses. These methods are evaluated on data sets from the 2006 NIST STD task.

[1]  Olivier Siohan,et al.  Fast vocabulary-independent audio search using path-based graph indexing , 2005, INTERSPEECH.

[2]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[3]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[4]  Michael Picheny,et al.  Improvements in phone based audio search via constrained match with high order confusion estimates , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Mark A. Clements,et al.  Phonetic searching applied to on-line distance learning modules , 2002, Proceedings of 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop..

[6]  Brian Kingsbury,et al.  Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.