Recovery from false rejection using statistical partial pattern trees for sentence verification

In conversational speech recognition, recognizers are generally equipped with a keyword spotting capability to accommodate a variety of speaking styles. In addition, language model incorporation generally improves the recognition performance. In conversational speech keyword spotting, there are two types of errors, false alarm and false rejection. These two types of errors are not modeled in language models and therefore offset the contribution of the language models. This paper describes a partial pattern tree (PPT) to model the partial grammatical rules of sentences resulting from recognition errors and ungrammatical sentences. Using the PPT and a proposed sentence-scoring algorithm, the false rejection errors can be recovered first. A sentence verification approach is then employed to re-rank and verify the recovered sentence hypotheses to give the results. A PPT merging algorithm is also proposed to reduce the number of partial patterns with similar syntactic structure and thus reduce the PPT tree size. An automatic call manager and an airline query system are implemented to assess the performance. The keyword error rates for these two systems using the proposed approach achieved 10.40% and 14.67%, respectively. The proposed method was compared with conventional approaches to show its superior performance.

[1]  Eric Sven Ristad,et al.  Nonuniform Markov models , 1996, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mari Ostendorf,et al.  Variable n-grams and extensions for conversational speech language modeling , 2000, IEEE Trans. Speech Audio Process..

[3]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[4]  Jerome R. Bellegarda,et al.  Speech recognition experiments using multi-span statistical language models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  S. C. Kremer Parallel stochastic grammar induction , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[6]  Yoshinori Sagisaka,et al.  Multi-class composite N-gram based on connection direction , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Jun Wu,et al.  A maximum entropy language model integrating N-grams and topic dependencies for conversational speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[9]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[10]  Andreas Kellner,et al.  Initial language models for spoken dialogue systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[12]  George Zavaliagkos,et al.  Sub-sentence discourse models for conversational speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[14]  Delphine Charlet,et al.  Confidence measure and incremental adaptation for the rejection of incorrect data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Thomas Niesler,et al.  Variable-length categoryn-gram language models , 1999, Comput. Speech Lang..

[16]  J. S. Hamaker Towards building a better language model for SWITCHBOARD: the POS tagging task , 1999, Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446).

[17]  Eduardo Lleida,et al.  Utterance verification in continuous speech recognition: decoding and training procedures , 2000, IEEE Trans. Speech Audio Process..

[18]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[19]  Yeou-Jiunn Chen,et al.  Integration of phonetic and prosodic information for robust utterance verification , 2000 .

[20]  Eng-Fong Huang,et al.  An efficient algorithm for syllable hypothesization in continuous Mandarin speech recognition , 1994, IEEE Trans. Speech Audio Process..

[21]  Mari Ostendorf,et al.  Transforming out-of-domain estimates to improve in-domain language models , 1997, EUROSPEECH.

[22]  Francis Jack Smith,et al.  Improving n-gram models by incorporating enhanced distributions , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[23]  Herbert Gish,et al.  Recent experiments in large vocabulary conversational speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[24]  Chang-Qing Shu Selected phoneme rejection grammar for a speech recognition system , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[25]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[26]  Pedro García-Teodoro,et al.  Different confidence measures for word verification in speech recognition , 2000, Speech Commun..

[27]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[28]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Timothy J. Hazen,et al.  Word and phone level acoustic confidence scoring , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[30]  Allen L. Gorin,et al.  Processing of semantic information in fluently spoken language , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  G.F.V.B. van Leeuwen,et al.  Towards a speech recognition based automatic telephone exchange with an Afrikaans conversational interface , 1999, 1999 IEEE Africon. 5th Africon Conference in Africa (Cat. No.99CH36342).

[32]  Dana H. Ballard,et al.  Word set probability boosting for improved spontaneous dialog recognition , 1997, IEEE Trans. Speech Audio Process..

[33]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[34]  Chung-Hsien Wu,et al.  Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven two-level CBSM , 2001, Speech Commun..

[35]  Giuseppe Riccardi,et al.  Integration of utterance verification with statistical language modeling and spoken language understanding , 2001, Speech Commun..