How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task

This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization. We train and evaluate large multilingual translation models using a single GPU for a maximum of 100 hours and get within 4-5 BLEU points of the top submission on the leaderboard. We also benchmark standard baselines on the PMI corpus and re-discover well-known shortcomings of translation systems and metrics.

[1]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[2]  Gihan Dias,et al.  Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation , 2020, ArXiv.

[3]  Mark Steedman,et al.  A massively parallel corpus: the Bible in 100 languages , 2014, Lang. Resour. Evaluation.

[4]  Ying Zhang,et al.  Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System? , 2004, LREC.

[5]  Olatunji Ruwase,et al.  DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[6]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[7]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[8]  Vishvajit Bakrola,et al.  Neural Machine Translation System of Indic Languages - An Attention based Approach , 2019, 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP).

[9]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[12]  Holger Schwenk,et al.  WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[13]  C. V. Jawahar,et al.  Revisiting Low Resource Status of Indian Languages in Machine Translation , 2021, COMAD/CODS.

[14]  Ondřej Bojar,et al.  OdiEnCorp 2.0: Odia-English Parallel Corpus for Machine Translation , 2020, WILDRE.

[15]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[16]  Jorg Tiedemann,et al.  The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT , 2020, WMT.

[17]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[18]  Olatunji Ruwase,et al.  ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[22]  Ondřej Bojar,et al.  Morphological Processing for English-Tamil Statistical Machine Translation , 2012 .

[23]  Jörg Tiedemann,et al.  Finding Alternative Translations in a Large Corpus of Movie Subtitle , 2016, LREC.

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Pushpak Bhattacharyya,et al.  The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.

[26]  Barry Haddow,et al.  PMIndia - A Collection of Parallel Corpora of Languages of India , 2020, ArXiv.

[27]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2021, NAACL.

[28]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[29]  Masao Utiyama,et al.  Introducing the Asian Language Treebank (ALT) , 2016, LREC.

[30]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.