On Efficient Coupling of ASR and SMT for Speech Translation

This paper presents an efficient tightly integrated approach for improved speech translation performance. The proposed approach combines the automatic speech recognition (ASR) and statistical machine translation (SMT) components in a bi-directional fashion. First, our SMT decoder takes the speech recognition lattice to perform an integrated search for the optimal translation by combining various ASR scores and translation models. Our approach is implemented within the recently proposed Folsom SMT framework that employs a multilayer search algorithm to conduct efficient operations on multiple graphs, which not only achieves memory efficiency and fast speed that is critical for real time speech translation applications, but also provides significant accuracy improvements. Secondly, we also report our experiments where the ASR is customized by reinforcing the language model to favor downstream translation component. We evaluated our approach on a large vocabulary speech translation task, and we obtain more than 2 point BLEU improvement over standard cascaded 1-best speech translation.