Scaling Neural Machine Translation
暂无分享,去创建一个
Myle Ott | Sergey Edunov | David Grangier | Michael Auli | Myle Ott | David Grangier | Michael Auli | Sergey Edunov
[1] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[2] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.
[3] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[4] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[5] Min Ye,et al. Communication-Computation Efficient Gradient Coding , 2018, ICML.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Richard Socher,et al. Weighted Transformer Network for Machine Translation , 2017, ArXiv.
[10] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[13] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[14] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[15] Kunle Olukotun,et al. High-Accuracy Low-Precision Training , 2018, ArXiv.
[16] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[17] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[18] Miguel Ballesteros,et al. Pieces of Eight: 8-bit Neural Machine Translation , 2018, NAACL.
[19] Philipp Koehn,et al. Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora , 2017, EMNLP.
[20] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[21] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[22] Jianfeng Gao,et al. A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.
[23] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[24] Alexandros G. Dimakis,et al. Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.
[25] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[26] Patrice Y. Simard,et al. Backpropagation without Multiplication , 1993, NIPS.
[27] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[28] Bryan Catanzaro,et al. Large Scale Language Modeling: Converging on 40GB of Text in Four Hours , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[29] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[30] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[32] Ondrej Dusek,et al. Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.
[33] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[34] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[35] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[36] Marc'Aurelio Ranzato,et al. Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.