The effect of pruning and compression on graphical representations of the output of a speech recognizer

Large vocabulary continuous speech recognition can benefit from an efficient data structure for representing a large number of acoustic hypotheses compactly. Word graphs or lattices have been chosen as such an efficient interface between acoustic recognition engines and subsequent language processing modules. This paper first investigates the effect of pruning during acoustic decoding on the quality of word lattices and shows that by combining different pruning options (at the model level and word level), we can obtain word lattices with comparable accuracy to the original lattices and a manageable size. In order to use the word lattices as the input for a post-processing language module, they should preserve the target hypotheses and their scores while being as small as possible. In this paper, we introduce a word graph compression algorithm that significantly reduces the number of words in the graphical representation without eliminating utterance hypotheses or distorting their acoustic scores. We compare this word graph compression algorithm with several other lattice size-reducing approaches and demonstrate the relative strength of the new word graph compression algorithm for decreasing the number of words in the representation. Experiments are conducted across corpora and vocabulary sizes to determine the consistency of the pruning and compression results.

[1]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Hermann Ney,et al.  Extensions to the word graph method for large vocabulary continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Vassilios Digalakis,et al.  Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists , 1994, HLT.

[4]  Yoshinori Sagisaka,et al.  Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[6]  Mari Ostendorf,et al.  Lattice-based search strategies for large vocabulary speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Wen Wang,et al.  Rescoring effectiveness of language models using different levels of knowledge and their integration , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[10]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[11]  Lidia Mangu,et al.  Finding consensus in speech recognition , 2000 .

[12]  Hermann Ney,et al.  Large vocabulary continuous speech recognition using word graphs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Hermann Ney,et al.  Word graphs: an efficient interface between continuous-speech recognition and language understanding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Lidia Mangu,et al.  Lattice Compression in the Consensual Post-Processing Framework , 2000 .

[15]  Mary P. Harper,et al.  Extensions to constraint dependency parsing for spoken language processing , 1995, Comput. Speech Lang..

[16]  Mary P. Harper,et al.  Interfacing acoustic models with natural language processing systems , 1998, ICSLP.

[17]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[18]  Jan W. Amtrup,et al.  What's in a word graph evaluation and enhancement of word lattices? , 1997, EUROSPEECH.

[19]  George Zavaliagkos,et al.  Is N-Best Dead? , 1994, HLT.

[20]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[21]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[23]  Mary P. Harper,et al.  NEAR MINIMAL WEIGHTED WORD GRAPHS FOR POST-PROCESSING SPEECH , 1999 .

[24]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[25]  Stefan Ortmanns,et al.  High quality word graphs using forward-backward pruning , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[26]  Mary P. Harper,et al.  Interfacing a CDG parser with an HMM word recognizer using word graphs , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[27]  Peter Regel-Brietzmann,et al.  DP-based wordgraph pruning , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[28]  Andreas Stolcke,et al.  Efficient lattice representation and generation , 1998, ICSLP.

[29]  Leah H. Jamieson,et al.  Incorporating prosodic information and language structure into speech recognition systems , 2000 .