暂无分享,去创建一个
Di He | Xu Tan | Tao Qin | Tie-Yan Liu | Yi Ren | Zhou Zhao | Tie-Yan Liu | Tao Qin | Xu Tan | Zhou Zhao | Di He | Yi Ren
[1] sitecore cryan. A Gift of Knowledge , 2000, PS: Political Science & Politics.
[2] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[3] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[5] Dianhai Yu,et al. Multi-Task Learning for Multiple Language Translation , 2015, ACL.
[6] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[9] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[10] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[11] Jan Niehues,et al. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.
[12] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[13] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.
[14] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.
[15] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[16] Junmo Kim,et al. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[19] Yale Song,et al. Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[20] Markus Freitag,et al. Ensemble Distillation for Neural Machine Translation , 2017, ArXiv.
[21] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[22] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[23] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[24] Yichao Lu,et al. A neural interlingua for multilingual machine translation , 2018, WMT.
[25] Graham Neubig,et al. Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.
[26] Victor O. K. Li,et al. Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.
[27] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[28] Di He,et al. Dense Information Flow for Neural Machine Translation , 2018, NAACL.
[29] Graham Neubig,et al. When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.
[30] Alan L. Yuille,et al. Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students , 2018, ArXiv.
[31] Di He,et al. Double Path Networks for Sequence to Sequence Learning , 2018, COLING.
[32] Yong Wang,et al. Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.
[33] Di He,et al. Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation , 2018, NeurIPS.
[34] Lijun Wu,et al. Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter , 2018, EMNLP.
[35] Xu Lan,et al. Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.
[36] Geoffrey E. Hinton,et al. Large scale distributed neural network training through online distillation , 2018, ICLR.
[37] Lijun Wu,et al. Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.
[38] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Di He,et al. Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder , 2019, AAAI.
[40] Di He,et al. Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input , 2018, AAAI.
[41] Di He,et al. Sentence-wise Smooth Regularization for Sequence to Sequence Learning , 2018, AAAI.