A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition. To solve the problem of OOV keywords, we use individual syllables as the sub-word unit in continuous speech recognition. To address OOV words, recognition errors, and high-speed retrieval, we propose a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice. When applied to syllable sequences, our proposed method outperformed a conventional DTW method between syllable sequences and was about 100 times faster. The retrieval results show that we can detect OOV words in a database containing 44h of audio in less than 10msec per query with an F-measure of 0.54.

[1]  Kenney Ng Towards robust methods for spoken document retrieval , 1998, ICSLP.

[2]  Naoyuki Kanda,et al.  Open-vocabulary keyword detection from super-large scale speech database , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[3]  Seiichi Nakagawa,et al.  Efficient out-of-vocabulary term detection by n-gram array indices with distance from a syllable lattice , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Peter Schäuble,et al.  New techniques for open-vocabulary spoken document retrieval , 1998, SIGIR '98.

[5]  Martha Larson,et al.  Using syllable-based indexing features and language models to improve German spoken document retrieval , 2003, INTERSPEECH.

[6]  Seiichi Nakagawa,et al.  Japanese spoken document retrieval considering OOV keywords using LVCSR system with OOV detection processing , 2002 .

[7]  Bhuvana Ramabhadran,et al.  Query-by-example Spoken Term Detection For OOV terms , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8]  Hsin-Min Wang,et al.  Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese , 2000, Speech Commun..

[9]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[10]  Tatsuya Kawahara,et al.  Overview of the IR for Spoken Documents Task in NTCIR-9 Workshop , 2011, NTCIR.

[11]  Tsuneo Nitta,et al.  Fast keyword detection using suffix array , 2009, INTERSPEECH.

[12]  Bhuvana Ramabhadran,et al.  Effect of pronounciations on OOV queries in spoken term detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Salim Roukos,et al.  A multistage algorithm for spotting new words in speech , 2002, IEEE Trans. Speech Audio Process..

[14]  Tatsuya Kawahara,et al.  Constructing Japanese test collections for spoken term detection , 2010, INTERSPEECH.

[15]  Andreas Stolcke,et al.  Open-vocabulary spoken term detection using graphone-based hybrid recognition systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Seiichi Nakagawa,et al.  Out-of-vocabulary term detection by n-gram array with distance from continuous syllable recognition results , 2010, 2010 IEEE Spoken Language Technology Workshop.

[17]  Seiichi Nakagawa,et al.  Large vocabulary speech recognition system: SPOJUS++ , 2011 .

[18]  Lin-Shan Lee,et al.  Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Bhuvana Ramabhadran,et al.  Phonetic query expansion for spoken document retrieval , 2008, INTERSPEECH.

[20]  Bhuvana Ramabhadran,et al.  Effect of pronunciations on OOV queries in spoken term detection , 2009 .

[21]  Seiichi Nakagawa,et al.  Strict Distance Measure for a Spoken Term Detection Method Based on a Syllable n-gram Index with Distance Metric , 2013 .

[22]  Pak-Chung Ching,et al.  Multi-scale audio indexing for Chinese spoken document retrieval , 2000, INTERSPEECH.

[23]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[24]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[25]  Michael Picheny,et al.  Matching Criteria for Vocabulary-Independent Search , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Mark Dredze,et al.  A spoken term detection framework for recovering out-of-vocabulary words using the web , 2010, INTERSPEECH.

[27]  Cyril Allauzen,et al.  General Indexation of Weighted Automata - Application to Spoken Utterance Retrieval , 2004, HLT-NAACL 2004.

[28]  Ross Wilkinson,et al.  Experiments in spoken document retrieval using phoneme n-grams , 2000, Speech Commun..

[29]  Hiromitsu Nishizaki,et al.  Japanese spoken term detection using syllable transition network derived from multiple speech recognizers' outputs , 2010, INTERSPEECH.