Keyword Spotting of Arbitrary Words Using Minimal Speech Resources

Traditional approaches to keyword spotting employ a large vocabulary speech recognizer, phone recognizer or a whole-word approach such as whole-word hidden Markov models. In any of these approaches, considerable speech resources are required to create a word spotting system. In this paper we describe a keyword spotting system that requires about fifteen minutes of word-level transcriptions of speech as its sole annotated resource. The system uses our self-organizing speech recognizer that defines its own sound units as a recognizer for the speech in the speech domain under consideration. The transcriptions are used to train a grapheme-to-sound-unit converter. We describe this novel system and give its keyword spotting performance

[1]  Steve J. Young,et al.  A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  J. Cohen Segmenting speech using dynamic programming. , 1981, The Journal of the Acoustical Society of America.

[3]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[4]  Frédéric Bimbot,et al.  Variable-length sequence matching for phonetic transcription using joint multigrams , 1995, EUROSPEECH.

[5]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[6]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.