Subword and phonetic search for detecting out-of-vocabulary keywords

We compare several approaches, separately and together, for spotting of out-of-vocabulary (OOV) keywords, in terms of their ATWV scores. We considered three types of recognition units (whole words, syllables, and subwords of different lengths) and two basic search strategies (whole-unit, fuzzy phonetic search). In all cases, the search was performed by collapsing the recognition lattice into a consensus network, either in terms of the recognized whole units, or by first splitting the recognized units into phonemes. We ran experiments on five languages, for which the language model and vocabulary were derived from only 10 hours of transcriptions (70k-100k words of text), resulting in keyword OOV rates varying from 10% to 63% on new data, depending on the language. Our conclusions were that: 1) In all cases, the fuzzy phonetic search on phoneme-split lattices is better than searching for the whole units, 2) The syllable units are the best of the subword units for OOV keyword detection using fuzzy phonetic search, and 3) These methods combine very well, sometimes resulting in ATWV scores for OOV terms which are not too far below those of IV terms.

[1]  Owen Kimball,et al.  Detection of unseen words in conversational Mandarin , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[3]  Jan Cernocký,et al.  BUT BABEL system for spontaneous Cantonese , 2013, INTERSPEECH.

[4]  Igor Sz SUB-WORD MODELING OF OUT OF VOCABULARY WORDS IN SPOKEN TERM DETECTION , 2008 .

[5]  Brian Kingsbury,et al.  Efficient spoken term detection using confusion networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Bhuvana Ramabhadran,et al.  Phonetic query expansion for spoken document retrieval , 2008, INTERSPEECH.

[7]  Sanjeev Khudanpur,et al.  Using proxies for OOV keywords in the keyword search task , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[8]  OpenKWS 13 Keyword Search Evaluation Plan 1 , 2013 .

[9]  Richard M. Schwartz,et al.  Normalizationofphonetic keyword search scores , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[11]  Bhuvana Ramabhadran,et al.  Towards using hybrid word and fragment units for vocabulary independent LVCSR systems , 2009, INTERSPEECH.

[12]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Brian Kingsbury,et al.  Automatic keyword selection for keyword search development and tuning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[15]  Bhuvana Ramabhadran,et al.  A new method for OOV detection using hybrid word/fragment system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Richard M. Schwartz,et al.  Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[17]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[18]  Jean-Luc Gauvain,et al.  Cross-word sub-word units for low-resource keyword spotting , 2014, SLTU.

[19]  Richard M. Schwartz,et al.  The 2013 BBN Vietnamese telephone speech keyword spotting system , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Xiaodong Cui,et al.  An empirical study of confusion modeling in keyword search for low resource languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[21]  Owen Kimball,et al.  Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.