Posterior based keyword spotting with a priori thresholds

In this paper, we propose a new posterior based scoring approach for keyword and non keyword (garbage) elements. The estimation of these scores is based on HMM state posterior probability definition, taking into account long contextual information and the prior knowledge (e.g. keyword model topology). The state posteriors are then integrated into keyword and garbage posteriors for every frame. These posteriors are used to make a decision on detection of the keyword at each frame. The frame level decisions are then accumulated (in this case, by counting) to make a global decision on having the keyword in the utterance. In this way, the contribution of possible outliers are minimized, as opposed to the conventional Viterbi decoding approach which accumulates likelihoods. Experiments on keywords from the Conversational Telephone Speech (CTS) and Numbers'95 databases are reported. Results show that the new scoring approach leads to better trade off between true and false alarms compared to the Viterbi decoding approach, while also providing the possibility to precalculate keyword specific spotting thresholds related to the length of the keywords.

[1]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Steve Renals,et al.  Confidence measures for hybrid HMM/ANN speech recognition , 1997, EUROSPEECH.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  Samy Bengio,et al.  Towards using hierarchical posteriors for flexible automatic speech recognition systems , 2004 .

[5]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[6]  M. A. Bush,et al.  Training and search algorithms for an interactive wordspotting system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  L. G. Miller,et al.  Improvements and applications for key word recognition using hidden Markov modeling techniques , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  R. Cole,et al.  TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[11]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[12]  Hervé Bourlard,et al.  Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Samy Bengio,et al.  Developing and enhancing posterior based speech recognition systems , 2005, INTERSPEECH.