Improve Transformer Models with Better Relative Position Embeddings
暂无分享,去创建一个
Davis Liang | Bing Xiang | Zhiheng Huang | Peng Xu | Zhiheng Huang | Bing Xiang | Davis Liang | Peng Xu
[1] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[2] Douglas Eck,et al. An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation , 2018, ArXiv.
[3] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[5] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.
[8] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[9] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[10] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[11] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[12] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[13] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[16] Tie-Yan Liu,et al. MPNet: Masked and Permuted Pre-training for Language Understanding , 2020, NeurIPS.