SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
暂无分享,去创建一个
Jingfei Du | Holger Schwenk | Benoît Sagot | J. Pino | Changhan Wang | Ann Lee | Hongyu Gong | Paul-Ambroise Duquenne | Ning Dong | Vedanuj Goswani
[1] A. Conneau,et al. FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[2] Shannon L. Spruit,et al. No Language Left Behind: Scaling Human-Centered Machine Translation , 2022, ArXiv.
[3] Holger Schwenk,et al. Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages , 2022, EMNLP.
[4] Holger Schwenk,et al. T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation , 2022, EMNLP.
[5] James R. Glass,et al. SAMU-XLSR: Semantically-Aligned Multimodal Utterance-Level Cross-Lingual Speech Representation , 2022, IEEE Journal of Selected Topics in Signal Processing.
[6] J. Dean,et al. Designing Effective Sparse Expert Models , 2022, 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[7] Yossi Adi,et al. Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation , 2022, INTERSPEECH.
[8] A. Conneau,et al. Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation , 2022, INTERSPEECH.
[9] Li Dong,et al. DeepNet: Scaling Transformers to 1, 000 Layers , 2022, ArXiv.
[10] Michelle Tadmor Ramanovich,et al. CVSS Corpus and Massively Multilingual Speech-to-Speech Translation , 2022, LREC.
[11] H. Schwenk,et al. Textless Speech-to-Speech Translation on Real Data , 2021, NAACL.
[12] Juan Pino,et al. XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale , 2021, INTERSPEECH.
[13] Michelle Tadmor Ramanovich,et al. Translatotron 2: High-quality direct speech-to-speech translation with voice preservation , 2021, ICML.
[14] A. Polyak,et al. Direct Speech-to-Speech Translation With Discrete Units , 2021, ACL.
[15] Marc'Aurelio Ranzato,et al. The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.
[16] Daniel Matthew Cer,et al. Language-agnostic BERT Sentence Embedding , 2020, ACL.
[17] Juan Pino,et al. CoVoST 2 and Massively Multilingual Speech Translation , 2021, Interspeech.
[18] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] Eugene Kharitonov,et al. Speech Resynthesis from Discrete Disentangled Self-Supervised Representations , 2021, Interspeech.
[20] Naman Goyal,et al. BASE Layers: Simplifying Training of Large, Sparse Models , 2021, ICML.
[21] Douglas W. Oard,et al. The Multilingual TEDx Corpus for Speech Recognition and Translation , 2021, Interspeech.
[22] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[23] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..
[24] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[25] Holger Schwenk,et al. CCMatrix: Mining Billions of High-Quality Parallel Sentences on the Web , 2019, ACL.
[26] Holger Schwenk,et al. Multimodal and Multilingual Embeddings for Large-Scale Speech Mining , 2021, NeurIPS.
[27] Gabriel Synnaeve,et al. Real Time Speech Enhancement in the Waveform Domain , 2020, INTERSPEECH.
[28] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[29] Iryna Gurevych,et al. Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.
[30] Juan Pino,et al. CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus , 2020, LREC.
[31] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.
[32] A. Sanchís,et al. Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Laurent Besacier,et al. MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible , 2019, LREC.
[34] Ray Kurzweil,et al. Multilingual Universal Sentence Encoder for Semantic Retrieval , 2019, ACL.
[35] Mattia Antonino Di Gangi,et al. MuST-C: a Multilingual Speech Translation Corpus , 2019, NAACL.
[36] Melvin Johnson,et al. Direct speech-to-speech translation with a sequence-to-sequence model , 2019, INTERSPEECH.
[37] Kyubyong Park,et al. CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages , 2019, INTERSPEECH.
[38] Ray Kurzweil,et al. Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax , 2019, IJCAI.
[39] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.
[40] Holger Schwenk,et al. Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , 2018, ACL.
[41] Holger Schwenk,et al. Filtering and Mining Parallel Data in a Joint Multilingual Space , 2018, ACL.
[42] Houda Bouamor,et al. H2@BUCC18: Parallel Sentence Extraction from Comparable Corpora Using Multilingual Sentence Embeddings , 2018, BUCC@LREC.
[43] Josef van Genabith,et al. An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification , 2017, IEEE Journal of Selected Topics in Signal Processing.
[44] Matthijs Douze,et al. Learning Joint Multilingual Sentence Representations with Neural Machine Translation , 2017, Rep4NLP@ACL.
[45] Tomoki Toda,et al. Improving translation of emphasis with pause prediction in speech-to-speech translation systems , 2015, IWSLT.
[46] Holger Schwenk,et al. On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.
[47] Satoshi Nakamura,et al. The ATR Multilingual Speech-to-Speech Translation System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[48] Dragos Stefan Munteanu,et al. Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.
[49] Philip Resnik,et al. Mining the Web for Bilingual Text , 1999, ACL.