暂无分享,去创建一个
Tao Lei | Jeremy Wohlwend | Howard Chen | Alexander Lin | Tao Lei | Jeremy Wohlwend | Howard Chen | Alexander Lin
[1] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[2] Yu Zhang,et al. Simple Recurrent Units for Highly Parallelizable Recurrence , 2017, EMNLP.
[3] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[4] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[5] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.
[6] John Langford,et al. Learning to Search Better than Your Teacher , 2015, ICML.
[7] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[8] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[9] Dan Garber,et al. Logarithmic Regret for Online Gradient Descent Beyond Strong Convexity , 2018, AISTATS.
[10] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[11] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[12] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[13] Yang Liu,et al. A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.
[14] Christopher D. Manning,et al. A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue , 2017, EACL.
[15] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[16] Graham Neubig,et al. Understanding Knowledge Distillation in Non-autoregressive Machine Translation , 2019, ICLR.
[17] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[18] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.
[19] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.
[20] Di He,et al. Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.
[21] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[22] Alexander M. Rush,et al. Latent Alignment and Variational Attention , 2018, NeurIPS.
[23] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[24] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[25] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[26] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[27] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[28] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[29] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.
[30] Hao Zhou,et al. Imitation Learning for Non-Autoregressive Neural Machine Translation , 2019, ACL.
[31] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.
[32] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.
[33] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[34] Dilin Wang,et al. Improving Neural Language Modeling via Adversarial Training , 2019, ICML.
[35] Noah A. Smith,et al. Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser , 2016, EMNLP.
[36] Chris Dyer,et al. Differentiable Scheduled Sampling for Credit Assignment , 2017, ACL.
[37] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[38] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[39] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.
[40] Yang Feng,et al. Bridging the Gap between Training and Inference for Neural Machine Translation , 2019, ACL.
[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[42] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[43] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[44] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[45] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[46] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.
[47] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[48] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[49] Nicholas Matthews,et al. Flambé: A Customizable Framework for Machine Learning Experiments , 2019, ACL.
[50] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[51] Rich Caruana,et al. Model compression , 2006, KDD '06.
[52] Anton Osokin,et al. SEARNN: Training RNNs with Global-Local Losses , 2017, ICLR.
[53] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.