Novel two-pass search strategy using time-asynchronous shortest-first second-pass beam search

In this paper, we describe a novel two-pass search strategy for large vocabulary continuous speech recognition. The first-pass of this strategy uses a regular time-synchronous beam search with rough models to generate a word lattice. Then, the second-pass search derives exact results from the word lattice using more accurate models. This search is “time-asynchronous shortest-first beam search”, which has two novel features: a time-asynchronous beam search mechanism using heuristics that are scores on the word lattice nodes and a strict pruning scheme using shortest-first hypothesis extension. 20k-word Japanese broadcast news recognition experiments show that our second-pass search is more accurate and more efficient than either N-best rescoring or A* search that are conventional second-pass search methods.

[1]  Mari Ostendorf,et al.  Lattice-based search strategies for large vocabulary speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Atsunori Ogawa,et al.  The second-pass search algorithm for multi-pass speech recognition strategy , 1999 .

[3]  Shigeki Sagayama,et al.  Fast and accurate beam search using forward heuristic functions in HMM-LR speech recognition , 1995, EUROSPEECH.

[4]  Frank K. Soong,et al.  The use of tree-trellis search for large-vocabulary Mandarin polysyllabic word speech recognition , 1994, Comput. Speech Lang..

[5]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.