论文信息 - A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use individual syllables as the sub-word unit in continuous speech recognition. To address OOV words, recognition errors, and high-speed retrieval, we propose a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice. When applied to syllable sequences, our proposed method outperformed a conventional DTW method between syllable sequences and was about 100 times faster. The retrieval results show that we can detect OOV words in a database containing 44h of audio in less than 10msec per query with an F-measure of 0.54.

Seiichi Nakagawa | Kazumasa Yamamoto | Yasuhisa Fujii | Keisuke Iwami

[1] Kenney Ng. Towards robust methods for spoken document retrieval , 1998, ICSLP.

[2] Naoyuki Kanda,et al. Open-vocabulary keyword detection from super-large scale speech database , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[3] Seiichi Nakagawa,et al. Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Peter Schäuble,et al. New techniques for open-vocabulary spoken document retrieval , 1998, SIGIR '98.

[5] Martha Larson,et al. Using syllable-based indexing features and language models to improve German spoken document retrieval , 2003, INTERSPEECH.

[6] Seiichi Nakagawa,et al. Japanese spoken document retrieval considering OOV keywords using LVCSR system with OOV detection processing , 2002 .

[7] Bhuvana Ramabhadran,et al. Query-by-example Spoken Term Detection For OOV terms , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8] Hsin-Min Wang,et al. Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese , 2000, Speech Commun..

[9] Richard Sproat,et al. Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[10] Tatsuya Kawahara,et al. Overview of the IR for Spoken Documents Task in NTCIR-9 Workshop , 2011, NTCIR.

[11] Tsuneo Nitta,et al. Fast keyword detection using suffix array , 2009, INTERSPEECH.

[12] Bhuvana Ramabhadran,et al. Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Salim Roukos,et al. A multistage algorithm for spotting new words in speech , 2002, IEEE Trans. Speech Audio Process..

[14] Tatsuya Kawahara,et al. Constructing Japanese test collections for spoken term detection , 2010, INTERSPEECH.

[15] Andreas Stolcke,et al. Open-vocabulary spoken term detection using graphone-based hybrid recognition systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] Seiichi Nakagawa,et al. Out-of-vocabulary term detection by n-gram array with distance from continuous syllable recognition results , 2010, 2010 IEEE Spoken Language Technology Workshop.

[17] Seiichi Nakagawa,et al. Large vocabulary speech recognition system: SPOJUS++ , 2011 .

[18] Lin-Shan Lee,et al. Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19] Bhuvana Ramabhadran,et al. Phonetic query expansion for spoken document retrieval , 2008, INTERSPEECH.

[20] Bhuvana Ramabhadran,et al. Effect of pronunciations on OOV queries in spoken term detection , 2009 .

[21] Seiichi Nakagawa,et al. Strict Distance Measure for a Spoken Term Detection Method Based on a Syllable n-gram Index with Distance Metric , 2013 .

[22] Pak-Chung Ching,et al. Multi-scale audio indexing for Chinese spoken document retrieval , 2000, INTERSPEECH.

[23] A. B.,et al. SPEECH COMMUNICATION , 2001 .

[24] Bhuvana Ramabhadran,et al. Vocabulary independent spoken term detection , 2007, SIGIR.

[25] Michael Picheny,et al. Matching Criteria for Vocabulary-Independent Search , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[26] Mark Dredze,et al. A spoken term detection framework for recovering out-of-vocabulary words using the web , 2010, INTERSPEECH.

[27] Cyril Allauzen,et al. General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[28] Ross Wilkinson,et al. Experiments in spoken document retrieval using phoneme n-grams , 2000, Speech Commun..

[29] Hiromitsu Nishizaki,et al. Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs , 2010, INTERSPEECH.