Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition

An integral part of any deployable speech recognition system is the capability to detect if the input speech does not contain any of the words in the recognizer vocabulary set. This capability, which is called utterance verification (or keyword recognition and nonkeyword rejection), is therefore becoming increasingly important as speech recognition systems continue to migrate from the laboratory to actual applications. We present a framework and a method for vocabulary independent utterance verification in subword-based speech recognition. The verification process is cast as a statistical hypothesis test, where vocabulary independence is accomplished through a two-stage verification process: subword-level verification followed by string-level verification. A verification function is defined and discriminatively trained to perform subword-level verification. String-level verification is accomplished by defining and evaluating an overall string-level log likelihood ratio that is a function of the subword-level verification scores. Experimental results show that this vocabulary-independent discriminative utterance verification method significantly outperforms a baseline method commonly used in wordspotting tasks.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  Biing-Hwang Juang,et al.  Robust utterance verification for connected digits recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[4]  Rafid A. Sukkar,et al.  Rejection for connected digit recognition based on GPD segmental discrimination , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Herbert Gish,et al.  Phonetic training and language modeling for word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Chin-Hui Lee,et al.  Improvements in connected digit recognition using higher order spectral and energy features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Alex Acero,et al.  Rejection techniques for digit recognition in telecommunication applications , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Steve J. Young,et al.  A fast lattice-based approach to vocabulary independent wordspotting , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Richard Rose,et al.  Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Jay G. Wilpon,et al.  A two pass classifier for utterance rejection in keyword spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Richard Lippmann,et al.  Wordspotter training using figure-of-merit back propagation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Baruch Mazor,et al.  Continuous word spotting for applications in telecommunications , 1992, ICSLP.

[13]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[14]  Yasuhiro Komori,et al.  Minimum error classification training for HMM-based keyword spotting , 1992, ICSLP.

[15]  Biing-Hwang Juang,et al.  Speaker recognition based on minimum error discriminative training , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Hervé Bourlard,et al.  Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Herbert Gish,et al.  Spotting events in continuous speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Richard C. Rose,et al.  Task independent wordspotting using decision tree based allophone clustering , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  B. Chigier,et al.  Rejection and keyword spotting algorithms for a directory assistance city name recognition application , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.