Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword search

We explore the potential of using keyword-aware language modeling to extend the ability of trading higher false alarm rates in exchange for lower miss detection rates in LVCSRbased keyword search (KWS). A context-dependent keyword language modeling method is also proposed to further enhance the keyword-aware language modeling framework by reducing the number of false alarms often sacrificed in order to achieve the desirable low miss detection rates. We demonstrate that by using keyword-aware language modeling, a KWS system is able to achieve different operating points (misses vs. false alarms) by tuning a parameter in language modeling. We observe a relative gain of 20% in actual term weighted value (ATWV) performance with the keyword-aware KWS systems over the conventional LVCSR-based KWS systems when testing on the English Switchboard data. Moreover the proposed context-dependent keyword language modeling could further achieve a 9% relative ATWV improvement over the original keyword-aware KWS systems for single-word keywords which cause the most false alarms.

[1]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Herbert Gish,et al.  Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[3]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[5]  I-Fan Chen,et al.  A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[6]  I-Fan Chen,et al.  A hybrid HMM/DNN approach to keyword spotting of short words , 2013, INTERSPEECH.

[7]  Biing-Hwang Juang,et al.  Key-phrase detection and verification for flexible speech understanding , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Myoung-Wan Koo,et al.  Speech recognition and utterance verification based on a generalized confidence score , 2001, IEEE Trans. Speech Audio Process..

[9]  I-Fan Chen,et al.  A Keyword-Aware Language Modeling Approach to Spoken Keyword Search , 2016, J. Signal Process. Syst..

[10]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  Jiazhi Ou,et al.  Utterance verification of short keywords using hybrid neural-network/HMM approach , 2001, 2001 International Conferences on Info-Tech and Info-Net. Proceedings (Cat. No.01EX479).

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Bhuvana Ramabhadran,et al.  Vocabulary independent spoken term detection , 2007, SIGIR.

[14]  S. Furui,et al.  Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication , 2000, Proceedings of the IEEE.

[15]  Sridha Sridharan,et al.  A phonetic search approach to the 2006 NIST spoken term detection evaluation , 2007, INTERSPEECH.

[16]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition , 1998 .

[17]  I-Fan Chen,et al.  A keyword-aware grammar framework for LVCSR-based spoken keyword search , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[19]  Andreas Stolcke,et al.  The SRI/OGI 2006 spoken term detection system , 2007, INTERSPEECH.

[20]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.