Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation

Automatic translation systems are known to struggle with rare words. Among these, named entities (NEs) and domain-specific terms are crucial, since errors in their translation can lead to severe meaning distortions. Despite their importance, previous speech translation (ST) studies have neglected them, also due to the dearth of publicly available resources tailored to their specific evaluation. To fill this gap, we i) present the first systematic analysis of the behavior of state-of-the-art ST systems in translating NEs and terminology, and ii) release NEuRoparl-ST, a novel benchmark built from European Parliament speeches annotated with NEs and terminology. Our experiments on the three language directions covered by our benchmark (en→es/fr/it) show that ST systems correctly translate 75–80% of terms and 65–70% of NEs, with very low performance (37–40%) on person names.

[1]  Yannick Estève,et al.  End-to-end named entity extraction from speech , 2018, ArXiv.

[2]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[3]  Matteo Negri,et al.  Enhancing Transformer for End-to-end Speech-to-Text Translation , 2019, MTSummit.

[4]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[5]  Toms Bergmanis,et al.  Facilitating Terminology Translation with Target Lemma Annotations , 2021, EACL.

[6]  Alfons Juan-Císcar,et al.  Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Yu Zhou,et al.  Knowledge Graphs Enhanced Neural Machine Translation , 2020, IJCAI.

[8]  Yannick Estève,et al.  TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.

[9]  Deryle W. Lonsdale,et al.  Improving NMT Quality Using Terminology Injection , 2020, LREC.

[10]  Tomasz Potapczyk,et al.  SRPOL’s System for the IWSLT 2020 End-to-End Speech Translation Task , 2020, IWSLT.

[11]  Juan Pino,et al.  CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus , 2020, LREC.

[12]  Parisa Kordjamshidi,et al.  Knowledge Graphs Effectiveness in Neural Machine Translation Improvement , 2020, Comput. Sci..

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Varvara Logacheva,et al.  DeepPavlov: Open-Source Library for Dialogue Systems , 2018, ACL.

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  Lucia Specia,et al.  Guiding Neural Machine Translation Decoding with External Knowledge , 2017, WMT.

[17]  Paul Buitelaar,et al.  Utilizing Knowledge Graphs for Neural Machine Translation Augmentation , 2019, K-CAP.

[18]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Joseph Olive,et al.  Machine Translation from Speech , 2011 .

[20]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[21]  Alex Waibel,et al.  JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Yi Yu,et al.  End-to-end Named Entity Recognition from English Speech , 2020, INTERSPEECH.

[23]  Gonzalo Iglesias,et al.  Neural Machine Translation Decoding with Terminology Constraints , 2018, NAACL.

[24]  Ron Artstein Inter-Coder Agreement for Computational Linguistics , 2008 .

[25]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[26]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[27]  Adam Lopez,et al.  Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.

[28]  Sophie Rosset,et al.  Where are we in Named Entity Recognition from Speech? , 2020, LREC.

[29]  Qun Liu,et al.  Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search , 2017, ACL.

[30]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[31]  Leiying Zhou,et al.  Incorporating Named Entity Information into Neural Machine Translation , 2020, NLPCC.

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[34]  Yu Zhou,et al.  Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity , 2020, COLING.

[35]  Mattia Antonino Di Gangi,et al.  MuST-C: A multilingual corpus for end-to-end speech translation , 2021, Comput. Speech Lang..

[36]  Heng Ji,et al.  Name-aware Machine Translation , 2013, ACL.

[37]  Kevin Knight,et al.  Name Translation in Statistical Machine Translation - Learning When to Transliterate , 2008, ACL.

[38]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[39]  Akihiro Tamura,et al.  Neural Machine Translation Incorporating Named Entity , 2018, COLING.

[40]  Matteo Negri,et al.  End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 , 2020, IWSLT.

[41]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[42]  Olivier Galibert,et al.  The ETAPE speech processing evaluation , 2014, LREC.

[43]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[44]  Jiajun Zhang,et al.  Exploiting Knowledge Graph in Neural Machine Translation , 2018, Communications in Computer and Information Science.

[45]  Jiajun Zhang,et al.  The Impact of Named Entity Translation for Neural Machine Translation , 2018, Communications in Computer and Information Science.

[46]  Yaser Al-Onaizan,et al.  Training Neural Machine Translation to Apply Terminology Constraints , 2019, ACL.

[47]  Olivier Pietquin,et al.  Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.

[48]  Nadir Durrani,et al.  FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN , 2020, IWSLT.

[49]  Navdeep Jaitly,et al.  Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.

[50]  Matthias Sperber,et al.  Speech Translation and the End-to-End Promise: Taking Stock of Where We Are , 2020, ACL.

[51]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Matteo Negri,et al.  On Knowledge Distillation for Direct Speech Translation , 2020, CLiC-it.

[53]  Jiajun Zhang,et al.  End-to-End Speech Translation with Knowledge Distillation , 2019, INTERSPEECH.

[54]  Kun Wang,et al.  Alignment-Enhanced Transformer for Constraining NMT with Pre-Specified Translations , 2020, AAAI.

[55]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[56]  Marcello Federico,et al.  The ModernMT Project , 2018 .