论文信息 - Discriminative word-spotting using ordered spectro-temporal patch features

Discriminative word-spotting using ordered spectro-temporal patch features

We present a novel architecture for word-spotting which is trained from a small number of examples to classify an utterance as containing a target keyword or not. The word-spotting architecture relies on a novel feature set consisting of a set of ordered spectro-temporal patches which are extracted from exemplar mel-spectra of target keywords. A local pooling operation across frequency and time is introduced which endows the extracted patch features with the flexibility to match novel unseen keywords. Finally, we describe how to train a support vector machine classifier to separate between keyword and nonkeyword patch feature responses. We present preliminary results indicating that our word-spotting architecture achieves a detection rate of 70-95% with false positive rates of about 0.252 false positives per minute.

Tony Ezzat | Tomaso A. Poggio | T. Poggio | Tony Ezzat

[1] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[2] James R. Glass,et al. Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[3] W. Russell,et al. Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4] Michael Weintraub,et al. Keyword-spotting using SRI's DECIPHER large-vocabulary speech-recognition system , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] K. Sen,et al. Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[6] PoggioTomaso,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[7] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[8] Nello Cristianini,et al. Support Vector Machines and Kernel Methods: The New Generation of Learning Machines , 2002, AI Mag..

[9] Thomas Serre,et al. Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Steve J. Young,et al. A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Richard Rose,et al. A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12] Powen Ru,et al. Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.