Verifying and correcting recognition string hypotheses using discriminative utterance verification

Abstract Utterance verification (UV) is a process by which the output of a speech recognizer is verified to determine if the input speech actually includes the recognized keyword(s). The output of the speech verifier is a binary decision to accept or reject the recognized utterance based on a UV confidence score. In this paper, we extend the notion of utterance verification by presenting an utterance verification method that will be utilized to perform three tasks: (1) detect non-keyword strings (false alarms), (2) detect keyword substitution errors, and (3) selectively correct substitution errors when N -best string hypotheses are available. The utterance verification method presented here employs a set of verification-specific models that are independent of the models used in the recognition process. The verification models are trained using a discriminative training procedure that seeks to minimize the verification error by simultaneously maximizing the rejection of non-keywords and misrecognized keywords while minimizing the rejection of correctly recognized keywords. The error correction is performed by reordering the hypotheses produced by an N -best recognizer based on a UV confidence score.

[1]  Rafid A. Sukkar,et al.  Rejection for connected digit recognition based on GPD segmental discrimination , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Herbert Gish,et al.  Phonetic training and language modeling for word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Herbert Gish,et al.  Spotting events in continuous speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Yasuhiro Komori,et al.  Minimum error classification training for HMM-based keyword spotting , 1992, ICSLP.

[6]  Baruch Mazor,et al.  Continuous word spotting for applications in telecommunications , 1992, ICSLP.

[7]  Chin-Hui Lee,et al.  Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Biing-Hwang Juang,et al.  Robust utterance verification for connected digits recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Biing-Hwang Juang,et al.  A vocabulary independent discriminatively trained method for rejection of non-keywords in sub word based speech recognition , 1995, EUROSPEECH.

[10]  Richard Lippmann,et al.  Wordspotter training using figure-of-merit back propagation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[12]  Jay G. Wilpon,et al.  A two pass classifier for utterance rejection in keyword spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[14]  Hervé Bourlard,et al.  Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Biing-Hwang Juang,et al.  Discriminative utterance verification using minimum string verification error (MSVE) training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Alex Acero,et al.  Rejection techniques for digit recognition in telecommunication applications , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Richard Rose,et al.  Discriminant wordspotting techniques for rejecting non-vocabulary utterances in unconstrained speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  B. Chigier,et al.  Rejection and keyword spotting algorithms for a directory assistance city name recognition application , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Rafid A. Sukkar,et al.  Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Francisco Javier Caminero Gil,et al.  New n-best based rejection techniques for improving a real-time telephonic connected word recognition system , 1995, EUROSPEECH.

[22]  Biing-Hwang Juang,et al.  A training procedure for verifying string hypotheses in continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.