A fast search technique for discriminative keyword spotting

Keyword spotting systems can be divided into two main groups: HMM-based and discriminative-based systems. Some of these systems apply a phonetic search algorithm to the sequence of recognized phones to find position of target keyword in a set of speech utterances. Thus, they need a fast and accurate phonetic search algorithm to find the position of the target keyword. In this paper, we propose a hierarchical search algorithm. In each level of hierarchy, some segments of input speech will be ignored due to their low probability of being target keyword. This tends to a smaller search space and so faster search and lower computational complexity in comparison with the Viterbi algorithm which is usually used in keyword spotting applications as a phonetic search algorithm. We apply the proposed search method to the classification part of the discriminative keyword spotter introduced in our previous works. The experimental results indicate that the hierarchical search algorithm is 100 times faster than the modified Viterbi algorithm when it is used in the discriminative keyword spotting system. On the other hand, FOM of the discriminative keyword spotting system using the proposed hierarchical search algorithms degraded about 2 % in comparison to the case that this system uses a modified version of Viterbi algorithm.

[1]  Samy Bengio,et al.  Automatic Speech and Speaker Recognition , 2009 .

[2]  Hakan Erdogan,et al.  Filler model based confidence measures for spoken dialogue systems: a case study for Turkish , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[4]  Karen Spärck Jones,et al.  Talker-independent keyword spotting for information retrieval , 1995, EUROSPEECH.

[5]  Victor Zue,et al.  A segment-based wordspotter using phonetic filler models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Peng Gao,et al.  A Novel Phone-State Matrix Based Vocabulary-Indenendent Keyword Spotting Method for Spontaneous Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Samy Bengio,et al.  Posterior based keyword spotting with a priori thresholds , 2006, INTERSPEECH.

[8]  M. L. Rossen,et al.  A whole word recurrent neural network for keyword spotting , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Samy Bengio,et al.  Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods , 2009 .

[10]  Brian Kingsbury,et al.  Fast decoding for open vocabulary spoken term detection , 2009, HLT-NAACL.

[11]  Gérard Chollet,et al.  Confidence measures for keyword spotting using support vector machines , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Ahmad Akbari,et al.  An evolutionary based discriminative system for keyword spotting , 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP).

[13]  Harald Höge,et al.  Efficient methods for detecting keywords in continuous speech , 1997, EUROSPEECH.

[14]  Samy Bengio,et al.  Discriminative keyword spotting , 2009, Speech Commun..

[15]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[16]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[17]  Ahmad Akbari,et al.  A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter , 2011, 2011 IEEE GCC Conference and Exhibition (GCC).