论文信息 - An Empirical Study of End-To-End Simultaneous Speech Translation Decoding Strategies - 字舞流文

An Empirical Study of End-To-End Simultaneous Speech Translation Decoding Strategies

This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding approach allows to control BLEU/Average Lagging trade-off along different latency regimes. Our best decoding settings achieve comparable results with a strong cascade model evaluated on the simultaneous translation track of IWSLT 2020 shared task.

Laurent Besacier | Ha Nguyen | Yannick Esteve | Y. Estève | L. Besacier | Ha Nguyen

[1] Sathish Reddy Indurthi,et al. End-to-End Simultaneous Translation System for IWSLT2020 Using Modality Agnostic Meta-Learning , 2020, IWSLT.

[2] Juan Pino,et al. Monotonic Multihead Attention , 2019, ICLR.

[3] Florian Metze,et al. How2: A Large-scale Dataset for Multimodal Language Understanding , 2018, NIPS 2018.

[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[5] Haifeng Wang,et al. STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework , 2018, ACL.

[6] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Mattia Antonino Di Gangi,et al. MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.

[9] Wei Li,et al. Monotonic Infinite Lookback Attention for Simultaneous Machine Translation , 2019, ACL.

[10] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[11] Andrej Ljolje,et al. Segmentation Strategies for Streaming Speech Translation , 2013, HLT-NAACL.

[12] Srinivas Bangalore,et al. Real-time Incremental Speech-to-Speech Translation of Dialogs , 2012, NAACL.

[13] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14] Nadir Durrani,et al. Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation , 2018, NAACL.

[15] Juan Pino,et al. SIMULEVAL: An Evaluation Toolkit for Simultaneous Translation , 2020, EMNLP.

[16] Benjamin Lecouteux,et al. ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020 , 2020, IWSLT.

[17] Nadir Durrani,et al. FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN , 2020, IWSLT.

[18] Matthias Sperber,et al. Low-Latency Neural Speech Translation , 2018, INTERSPEECH.

[19] Jakob Verbeek,et al. Efficient Wait-k Models for Simultaneous Machine Translation , 2020, INTERSPEECH.

[20] Fethi Bougares,et al. ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task , 2019, ArXiv.

[21] Alfons Juan-Císcar,et al. Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).