论文信息 - Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

Edinburgh’s Submissions to the 2020 Machine Translation Efficiency Task

We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.

[1] Kenneth Heafield,et al. ParaCrawl: Web-Scale Acquisition of Parallel Corpora , 2020, ACL.

[2] Gintare Karolina Dziugaite,et al. Stabilizing the Lottery Ticket Hypothesis , 2019 .

[3] Marcin Junczys-Dowmunt,et al. From Research to Production and Back: Ludicrously Fast Neural Machine Translation , 2019, EMNLP.

[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5] Kushal Datta,et al. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model , 2019, ArXiv.

[6] Ole Tange,et al. GNU Parallel: The Command-Line Power Tool , 2011, login Usenix Mag..

[7] Kenneth Heafield,et al. Neural Machine Translation with 4-Bit Precision and Beyond , 2019, ArXiv.

[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[9] André F. T. Martins,et al. Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[10] Marcin Junczys-Dowmunt,et al. Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.

[11] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[12] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[13] Marcin Junczys-Dowmunt,et al. Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data , 2018, WMT.

[14] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.

[15] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[16] Marta R. Costa-jussà,et al. Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[17] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[18] Marcin Junczys-Dowmunt,et al. Marian: Cost-effective High-Quality Neural Machine Translation in C++ , 2018, NMT@ACL.