论文信息 - Verifying and correcting recognition string hypotheses using discriminative utterance verification

Verifying and correcting recognition string hypotheses using discriminative utterance verification

Abstract Utterance verification (UV) is a process by which the output of a speech recognizer is verified to determine if the input speech actually includes the recognized keyword(s). The output of the speech verifier is a binary decision to accept or reject the recognized utterance based on a UV confidence score. In this paper, we extend the notion of utterance verification by presenting an utterance verification method that will be utilized to perform three tasks: (1) detect non-keyword strings (false alarms), (2) detect keyword substitution errors, and (3) selectively correct substitution errors when N -best string hypotheses are available. The utterance verification method presented here employs a set of verification-specific models that are independent of the models used in the recognition process. The verification models are trained using a discriminative training procedure that seeks to minimize the verification error by simultaneously maximizing the rejection of non-keywords and misrecognized keywords while minimizing the rejection of correctly recognized keywords. The error correction is performed by reordering the hypotheses produced by an N -best recognizer based on a UV confidence score.

[1] Rafid A. Sukkar,et al. Rejection for connected digit recognition based on GPD segmental discrimination , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Herbert Gish,et al. Phonetic training and language modeling for word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Herbert Gish,et al. Spotting events in continuous speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Chin-Hui Lee,et al. Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Yasuhiro Komori,et al. Minimum error classification training for HMM-based keyword spotting , 1992, ICSLP.

[6] Baruch Mazor,et al. Continuous word spotting for applications in telecommunications , 1992, ICSLP.

[7] Chin-Hui Lee,et al. Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8] Biing-Hwang Juang,et al. Robust utterance verification for connected digits recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9] Biing-Hwang Juang,et al. A vocabulary independent discriminatively trained method for rejection of non-keywords in sub word based speech recognition , 1995, EUROSPEECH.

[10] Richard Lippmann,et al. Wordspotter training using figure-of-merit back propagation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Biing-Hwang Juang,et al. Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[12] Jay G. Wilpon,et al. A two pass classifier for utterance rejection in keyword spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Biing-Hwang Juang,et al. Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[14] Hervé Bourlard,et al. Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15] Biing-Hwang Juang,et al. Discriminative utterance verification using minimum string verification error (MSVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16] Alex Acero,et al. Rejection techniques for digit recognition in telecommunication applications , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Richard Rose,et al. Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] B. Chigier,et al. Rejection and keyword spotting algorithms for a directory assistance city name recognition application , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] Rafid A. Sukkar,et al. Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20] Mitchel Weintraub,et al. LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21] Francisco Javier Caminero Gil,et al. New n-best based rejection techniques for improving a real-time telephonic connected word recognition system , 1995, EUROSPEECH.

[22] Biing-Hwang Juang,et al. A training procedure for verifying string hypotheses in continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.