Enhancing low resource keyword spotting with automatically retrieved web documents

Keyword Spotting (KWS) systems developed for low resource languages with very little transcribed audio suffer due to a small vocabulary (high out-of-vocabulary (OOV) rate) and a weak language model. In this paper, we propose to augment such systems using automatically retrieved web documents. Our procedure can find large volumes of web documents similar to a small pool of training transcriptions within a few hours, by querying a search engine with automatically generated query terms. We then use simple language identification to extract high-confidence text for lexicon expansion and language modeling. Experiments using six very limited language packs (VLLP) from the IARPA-Babel program show web documents can cut the OOV rate by half on the development set, and on average improve keyword spotting performance by 2.8 points absolute measured by the Actual Term Weighted Value (ATWV). In particular, we find most of the gains (8.7 points on average) are from keywords that were OOV in the baseline system, and are converted into in-vocabulary (IV) through lexicon expansion. These gains are obtained even after using subword units (unsupervised syllable-like units and sequences of phones), which are known to greatly enhance OOV keyword search performance.

[1]  Richard M. Schwartz,et al.  Combination of search techniques for improved spotting of OOV keywords , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Richard M. Schwartz,et al.  Discriminative semi-supervised training for keyword search in low resource languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  Owen Kimball,et al.  Detection of unseen words in conversational Mandarin , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Owen Kimball,et al.  Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.

[5]  Mei-Yuh Hwang,et al.  Web-data augmented language models for Mandarin conversational speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Alexander I. Rudnicky,et al.  Using web text to improve keyword spotting in speech , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[7]  Richard M. Schwartz,et al.  Subword and phonetic search for detecting out-of-vocabulary keywords , 2014, INTERSPEECH.

[8]  Timothy Baldwin,et al.  Cross-domain Feature Selection for Language Identification , 2011, IJCNLP.

[9]  Richard M. Schwartz,et al.  Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[10]  Richard M. Schwartz,et al.  Progress in the BBN keyword search system for the DARPA RATS program , 2014, INTERSPEECH.

[11]  Andreas Stolcke,et al.  Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures , 2003, NAACL.

[12]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[13]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[14]  Richard M. Schwartz,et al.  The 2013 BBN Vietnamese telephone speech keyword spotting system , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Regina Barzilay,et al.  Morphological Segmentation for Keyword Spotting , 2014, EMNLP.

[16]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[17]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .