论文信息 - Parallelization Strategies for a Dynamic Lexical Tree Decoder

Parallelization Strategies for a Dynamic Lexical Tree Decoder

Increasingly, physical limitations lead to a shift from high clocked single core processors to CPUs with up to eight, or more, independent but slower processing cores, and multi-core or even multi-CPU computers. In order to retain performance gains in the future, the speech decoding process has to be re-organized to employ a certain amount of thread-level parallelism on those CPUs. In this work, we compare two common approaches for dynamic prefix tree decoders: Parallel Score Computation and Parallel Search, and a combination of both. Both have already been studied intensively, however it is shown here, that the latter suffers from hardware cache effects which limit absolute speed-ups and scalability in general. We propose a cache efficient variation of the Parallel Score Computation which is more scalable and faster than any other parallel strategy we compared it with.

Florian Metze | Matthias Vogelgesang | Florian Metze | M. Vogelgesang

[1] Masahiko Yoshimoto,et al. Parallelized viterbi processor for 5, 000-word large-vocabulary real-time continuous speech recognition FPGA system , 2009, INTERSPEECH.

[2] Ryosuke Isotani,et al. Parallel LVCSR Algorithm for Cellphone-Oriented Multicore Processors , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3] Hermann Ney,et al. Using SIMD instructions for fast likelihood calculation in LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4] Eric A. Hansen,et al. Analysis of a parallel lexical-tree-based speech decoder for multi-core processors , 2009, 2009 17th European Signal Processing Conference.

[5] Ralf Schlüter,et al. Parallel fast likelihood computation for LVCSR using mixture decomposition , 2009, INTERSPEECH.

[6] Eric A. Hansen,et al. A lexical-tree division-based approach to parallelizing a cross-word speech decoder for multi-core processors , 2008, 2008 16th European Signal Processing Conference.

[7] Wonyong Sung,et al. OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8] A. Waibel,et al. A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[9] Ralf Schlüter,et al. Parallel lexical-tree based LVCSR on multi-core processors , 2010, INTERSPEECH.

[10] Anne Rogers,et al. Parallel Speech Recognition , 2004, International Journal of Parallel Programming.

[11] Pierre Dumouchel,et al. Using parallel architectures in speech recognition , 2009, INTERSPEECH.

[12] J. M. Bull,et al. Measuring Synchronisation and Scheduling Overheads in OpenMP , 2007 .

[13] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[14] StateStart StateFinalFigure. Parallel Implementation of Fast Beam Search for Speaker-independent Continuous Speech Recognition , 1993 .

[15] Kurt Keutzer,et al. A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit , 2009, INTERSPEECH.