论文信息 - The Multilingual TEDx Corpus for Speech Recognition and Translation - 字舞流文

The Multilingual TEDx Corpus for Speech Recognition and Translation

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We segment transcripts into sentences and align them to the sourcelanguage audio and target-language translations. The corpus is released along with open-sourced code enabling extension to new talks and languages as they become available. Our corpus creation methodology can be applied to more languages than previous work, and creates multi-way parallel evaluation sets. We provide baselines in multiple ASR and ST settings, including multilingual models to improve translation performance for lowresource language pairs.

Elizabeth Salesky | Douglas W. Oard | Matt Post | Marco Turchi | Matteo Negri | Roldano Cattoni | Jacob Bremerman | Matthew Wiesner

[1] Fabienne Braune,et al. Improved Unsupervised Sentence Alignment for Symmetrical and Asymmetrical Parallel Corpora , 2010, COLING.

[2] Laurent Besacier,et al. MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible , 2019, LREC.

[3] Nadir Durrani,et al. FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN , 2020, IWSLT.

[4] Navdeep Jaitly,et al. Sequence-to-Sequence Models Can Directly Translate Foreign Speech , 2017, INTERSPEECH.

[5] Juan Pino,et al. CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus , 2020, LREC.

[6] Alex Waibel,et al. JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7] Elizabeth Salesky,et al. Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation , 2019, ACL.

[8] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[9] Dmytro Okhonko,et al. fairseq S2T: Fast Speech-to-Text Modeling with fairseq , 2020, AACL.

[10] Tibor Kiss,et al. Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[11] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[12] Alfons Juan-Císcar,et al. Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Alexander Waibel,et al. Relative Positional Encoding for Speech Recognition and Direct Translation , 2020, INTERSPEECH.

[14] Siddharth Dalmia,et al. Epitran: Precision G2P for Many Languages , 2018, LREC.

[15] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[16] David Chiang,et al. An Attentional Model for Speech Translation Without Transcription , 2016, NAACL.

[17] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[18] Matteo Negri,et al. Adapting Transformer to End-to-End Spoken Language Translation , 2019, INTERSPEECH.

[19] Yun Tang,et al. Multilingual Speech Translation with Efficient Finetuning of Pretrained Models. , 2020 .

[20] Juan Pino,et al. CoVoST 2: A Massively Multilingual Speech-to-Text Translation Corpus , 2020, ArXiv.

[21] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[22] Philipp Koehn,et al. A Massive Collection of Cross-Lingual Web-Document Pairs , 2019, EMNLP.

[23] Josef R. Novak,et al. Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework , 2015, Natural Language Engineering.

[24] Olivier Pietquin,et al. Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation , 2016, NIPS 2016.

[25] Mauro Cettolo,et al. Overview of the IWSLT 2017 Evaluation Campaign , 2017, IWSLT.

[26] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[27] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[28] Lucia Specia,et al. The IWSLT 2019 Evaluation Campaign , 2019, IWSLT.

[29] Hermann Ney,et al. Evaluating Machine Translation Output with Automatic Sentence Segmentation , 2005, IWSLT.

[30] Mattia Antonino Di Gangi,et al. MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.

[31] Elizabeth Salesky,et al. Phone Features Improve Speech Translation , 2020, ACL.

[32] John C. Wells,et al. Computer-coding the IPA: a proposed extension of SAMPA , 1995 .

[33] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[34] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[35] Adam Lopez,et al. Pre-training on high-resource speech recognition improves low-resource speech-to-text translation , 2018, NAACL.

[36] Yu Zhang,et al. Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37] Arya D. McCarthy,et al. Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade , 2019, IWSLT.

[38] Matthias Sperber,et al. Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation , 2019, TACL.

[39] Ali Can Kocabiyikoglu,et al. Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation , 2018, LREC.

[40] Brian Thompson,et al. Vecalign: Improved Sentence Alignment in Linear Time and Space , 2019, EMNLP.

[41] Mattia Antonino Di Gangi,et al. MuST-C: A multilingual corpus for end-to-end speech translation , 2021, Comput. Speech Lang..

[42] Jan Niehues,et al. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[43] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[44] Holger Schwenk,et al. CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB , 2019, ArXiv.

[45] Marcello Federico,et al. Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[46] Kevin Duh,et al. Multilingual End-to-End Speech Translation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[47] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[48] Joseph Olive,et al. Machine Translation from Speech , 2011 .

[49] Arya D. McCarthy,et al. Massively Multilingual Pronunciation Modeling with WikiPron , 2020, LREC.

[50] Matt Post,et al. Improved speech-to-text translation with the Fisher and Callhome Spanish-English speech translation corpus , 2013, IWSLT.