论文信息 - How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task - 字舞流文

How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task

This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization. We train and evaluate large multilingual translation models using a single GPU for a maximum of 100 hours and get within 4-5 BLEU points of the top submission on the leaderboard. We also benchmark standard baselines on the PMI corpus and re-discover well-known shortcomings of translation systems and metrics.

Miryam de Lhoneux | Anders Søgaard | Marcel Bollmann | Daniel Hershcovich | Rahul Aralikatte | Héctor Ricardo Murrieta Bello | Anders Søgaard | Daniel Hershcovich | Rahul Aralikatte | Marcel Bollmann | Héctor Murrieta Bello

[1] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[2] Gihan Dias,et al. Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation , 2020, ArXiv.

[3] Mark Steedman,et al. A massively parallel corpus: the Bible in 100 languages , 2014, Lang. Resour. Evaluation.

[4] Ying Zhang,et al. Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[5] Olatunji Ruwase,et al. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[6] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[7] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[8] Vishvajit Bakrola,et al. Neural Machine Translation System of Indic Languages - An Attention based Approach , 2019, 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP).

[9] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11] Khalil Sima'an,et al. Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[12] Holger Schwenk,et al. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[13] C. V. Jawahar,et al. Revisiting Low Resource Status of Indian Languages in Machine Translation , 2021, COMAD/CODS.

[14] Ondřej Bojar,et al. OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation , 2020, WILDRE.

[15] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[16] Jorg Tiedemann,et al. The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT , 2020, WMT.

[17] Shashi Narayan,et al. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[18] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[21] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[22] Ondřej Bojar,et al. Morphological Processing for English-Tamil Statistical Machine Translation , 2012 .

[23] Jörg Tiedemann,et al. Finding Alternative Translations in a Large Corpus of Movie Subtitle , 2016, LREC.

[24] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25] Pushpak Bhattacharyya,et al. The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.

[26] Barry Haddow,et al. PMIndia - A Collection of Parallel Corpora of Languages of India , 2020, ArXiv.

[27] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2021, NAACL.

[28] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[29] Masao Utiyama,et al. Introducing the Asian Language Treebank (ALT) , 2016, LREC.

[30] Sampo Pyysalo,et al. Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.