Cross-word sub-word units for low-resource keyword spotting

We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cross-word subword units achieve similar performance on OOV keywords as other types of sub-word units, but can be combined to produce further gains. We also show that sub-word units can be used to improve detection of in-vocabulary keywords. System combination provides a 18% relative gain in ATWV with the best two systems, and 25% with the best three systems.

[1]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[2]  Karen Spärck Jones,et al.  Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[3]  Trumpington Street,et al.  A FAST LATTICE-BASED APPROACH TO VOCABULARY INDEPENDENT WORDSPOTTING , 1994 .

[4]  Owen Kimball,et al.  Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.

[5]  Nelson Morgan,et al.  The TAO of ATWV: Probing the mysteries of keyword search performance , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[6]  Michael Picheny,et al.  Improvements in phone based audio search via constrained match with high order confusion estimates , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[7]  R. E. Jones,et al.  EXPERIMENTS IN INFORMATION RETRIEVAL FROM SPOKEN DOCUMENTS , 1998 .

[8]  Olivier Siohan,et al.  Fast vocabulary-independent audio search using path-based graph indexing , 2005, INTERSPEECH.

[9]  Jonathan G. Fiscus,et al.  Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[10]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11]  Mikko Kurimo,et al.  Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval , 2008, TSLP.

[12]  Murat Saraclar,et al.  Lattice Indexing for Spoken Term Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Richard M. Schwartz,et al.  Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[16]  Xiaodong Cui,et al.  A high-performance Cantonese keyword search system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[18]  Lukás Burget,et al.  Sub-word modeling of out of vocabulary words in spoken term detection , 2008, 2008 IEEE Spoken Language Technology Workshop.

[19]  Kam-Fai Wong,et al.  A Study on Word-Based and Integral-Bit Chinese Text Compression Algorithms , 1999, J. Am. Soc. Inf. Sci..

[20]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Mikko Kurimo,et al.  Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[22]  Sanjeev Khudanpur,et al.  Using proxies for OOV keywords in the keyword search task , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[23]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .