论文信息 - Cross-word sub-word units for low-resource keyword spotting

Cross-word sub-word units for low-resource keyword spotting

We investigate the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task. Sub-word units based on morphological decomposition and character ngrams are compared. In particular, we examine the benefit of sub-word units that cross word boundaries. Experiments are performed on the IARPA Babel Turkish dataset. Our results demonstrate that cross-word subword units achieve similar performance on OOV keywords as other types of sub-word units, but can be combined to produce further gains. We also show that sub-word units can be used to improve detection of in-vocabulary keywords. System combination provides a 18% relative gain in ATWV with the best two systems, and 25% with the best three systems.

Jean-Luc Gauvain | Lori Lamel | William Hartmann

[1] Herbert Gish,et al. Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[2] Karen Spärck Jones,et al. Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[3] Trumpington Street,et al. A FAST LATTICE-BASED APPROACH TO VOCABULARY INDEPENDENT WORDSPOTTING , 1994 .

[4] Owen Kimball,et al. Subword speech recognition for detection of unseen words , 2012, INTERSPEECH.

[5] Nelson Morgan,et al. The TAO of ATWV: Probing the mysteries of keyword search performance , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[6] Michael Picheny,et al. Improvements in phone based audio search via constrained match with high order confusion estimates , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[7] R. E. Jones,et al. EXPERIMENTS IN INFORMATION RETRIEVAL FROM SPOKEN DOCUMENTS , 1998 .

[8] Olivier Siohan,et al. Fast vocabulary-independent audio search using path-based graph indexing , 2005, INTERSPEECH.

[9] Jonathan G. Fiscus,et al. Results of the 2006 Spoken Term Detection Evaluation , 2006 .

[10] W. Russell,et al. Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[11] Mikko Kurimo,et al. Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval , 2008, TSLP.

[12] Murat Saraclar,et al. Lattice Indexing for Spoken Term Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Richard M. Schwartz,et al. Score normalization and system combination for improved keyword spotting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15] Kai Feng,et al. The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[16] Xiaodong Cui,et al. A high-performance Cantonese keyword search system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] Richard Sproat,et al. Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[18] Lukás Burget,et al. Sub-word modeling of out of vocabulary words in spoken term detection , 2008, 2008 IEEE Spoken Language Technology Workshop.

[19] Kam-Fai Wong,et al. A Study on Word-Based and Integral-Bit Chinese Text Compression Algorithms , 1999, J. Am. Soc. Inf. Sci..

[20] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21] Mikko Kurimo,et al. Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline , 2013 .

[22] Sanjeev Khudanpur,et al. Using proxies for OOV keywords in the keyword search task , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[23] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .