论文信息 - Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword search

Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword search

We explore the potential of using keyword-aware language modeling to extend the ability of trading higher false alarm rates in exchange for lower miss detection rates in LVCSRbased keyword search (KWS). A context-dependent keyword language modeling method is also proposed to further enhance the keyword-aware language modeling framework by reducing the number of false alarms often sacrificed in order to achieve the desirable low miss detection rates. We demonstrate that by using keyword-aware language modeling, a KWS system is able to achieve different operating points (misses vs. false alarms) by tuning a parameter in language modeling. We observe a relative gain of 20% in actual term weighted value (ATWV) performance with the keyword-aware KWS systems over the conventional LVCSR-based KWS systems when testing on the English Switchboard data. Moreover the proposed context-dependent keyword language modeling could further achieve a 9% relative ATWV improvement over the original keyword-aware KWS systems for single-word keywords which cause the most false alarms.

I-Fan Chen | Chin-Hui Lee | Tze Siong Lau

[1] Richard Rose,et al. A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2] Herbert Gish,et al. Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[3] Chin-Hui Lee,et al. Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4] Daben Liu,et al. Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[5] I-Fan Chen,et al. A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[6] I-Fan Chen,et al. A hybrid HMM/DNN approach to keyword spotting of short words , 2013, INTERSPEECH.

[7] Biing-Hwang Juang,et al. Key-phrase detection and verification for flexible speech understanding , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8] Myoung-Wan Koo,et al. Speech recognition and utterance verification based on a generalized confidence score , 2001, IEEE Trans. Speech Audio Process..

[9] I-Fan Chen,et al. A Keyword-Aware Language Modeling Approach to Spoken Keyword Search , 2016, J. Signal Process. Syst..

[10] Chin-Hui Lee,et al. Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11] Jiazhi Ou,et al. Utterance verification of short keywords using hybrid neural-network/HMM approach , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[12] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[13] Bhuvana Ramabhadran,et al. Vocabulary independent spoken term detection , 2007, SIGIR.

[14] S. Furui,et al. Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication , 2000, Proceedings of the IEEE.

[15] Sridha Sridharan,et al. A phonetic search approach to the 2006 NIST spoken term detection evaluation , 2007, INTERSPEECH.

[16] Chin-Hui Lee,et al. Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition , 1998 .

[17] I-Fan Chen,et al. A keyword-aware grammar framework for LVCSR-based spoken keyword search , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18] Lukás Burget,et al. Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[19] Andreas Stolcke,et al. The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[20] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.