11 Discriminative Keyword Spotting

This chapter introduces a discriminative method for detect ing and spotting keywords in spoken utterances. Given a word represented as a sequence of pho nemes and a spoken utterance, the keyword spotter predicts the best time span of the phonem e s quence in the spoken utterance along with a confidence. If the prediction confidence is a bove certain level the keyword is declared to be spoken in the utterance within the predicte time span, otherwise the keyword is declared as not spoken. The problem of keyword spotti ng raining is formulated as a discriminative task where the model parameters are chosen so the utterance in which the keyword is spoken would have higher confidence than any other spoken utterance in which the keyword is not spoken. It is shown theoretically and empi rically that the proposed training method resulted with a high area under the receiver opera ting characteristic (ROC) curve, the most common measure to evaluate keyword spotters. We pre s nt an iterative algorithm to train the keyword spotter efficiently. The proposed appro ach contrasts with standard spotting strategies based on HMMs, for which the training proced ur does not maximize a loss directly related to the spotting performance. Several expe riments performed on TIMIT and WSJ corpora show the advantage of our approach over HMM-base d alternatives.

[1]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[2]  Samy Bengio,et al.  Posterior based keyword spotting with a priori thresholds , 2006, INTERSPEECH.

[3]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[4]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[5]  Gérard Chollet,et al.  Confidence measures for keyword spotting using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  I. Lee Hetherington,et al.  Keyword-based discriminative training of acoustic models , 2000, INTERSPEECH.

[7]  Hervé Bourlard,et al.  Iterative Posterior-Based Keyword Spotting Without Filler Models , 1999 .

[8]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[9]  Chin-Hui Lee,et al.  Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Michael Weintraub,et al.  Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[14]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[15]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Toshiyuki Hanazawa,et al.  Word spotting method based on HMM phoneme recognition , 1988 .

[17]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[18]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[19]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .