Keyword spotting based on the analysis of template matching distances

This paper presents a system for speaker independent keyword spotting (KWS) in continuous speech using a spoken example template. The approach, based on Dynamic Time Warping (DTW) for matching the template to a test utterance, does not require any modelling or training as required in alternative techniques such as the Hidden Markov Model (HMM). This is of particular relevance to applications such as detection of words that have not been adequately represented in a training database (e.g. searching for topical words that are emerging in society). Introduced is the use of the DTW distance histogram for automatic estimation of similarity thresholds for every keyword-utterance pair. Experiments conducted on a wide range of speech sentences and keywords show that when only a few examples of the keyword are available, the proposed system has higher recall ratio than a HMM-based approach.

[1]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[2]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Aren Jansen,et al.  Point Process Models for Spotting Keywords in Continuous Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  L. Rabiner,et al.  Automatic speech recognition of small vocabularies within the context of unconstrained input , 1988 .

[7]  Peng Yu,et al.  Fast two-stage vocabulary independent search in spontaneous speech , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Lukás Burget,et al.  Phoneme Based Acoustics Keyword Spotting in Informal Continuous Speech , 2005, TSD.

[9]  Ya-Dong Wu,et al.  Keyword spotting method based on speech feature space trace matching , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[10]  Peng Yu,et al.  Vocabulary-independent search in spontaneous speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Aude Billard,et al.  Keyword Detection for Spontaneous Speech , 2009, 2009 2nd International Congress on Image and Signal Processing.

[12]  Martha Larson,et al.  Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.

[13]  Jean-François Naviner,et al.  A keyword spotting method based on speech feature space trace matching , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Björn W. Schuller,et al.  Robust vocabulary independent keyword spotting with graphical models , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[15]  P. C. M. F. J. Owens BSc Signal Processing of Speech , 1993, Macmillan New Electronics Series.

[16]  Samy Bengio,et al.  Discriminative keyword spotting , 2009, Speech Commun..

[17]  Hervé Bourlard,et al.  Posterior-Based Features and Distances in Template Matching for Speech Recognition , 2007, MLMI.

[18]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[19]  Meinard Müller,et al.  Perceptual audio features for unsupervised key-phrase detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Hervé Bourlard,et al.  Iterative Posterior-Based Keyword Spotting Without Filler Models , 1999 .

[21]  R. Wohlford,et al.  Keyword recognition using template concatenation , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.