暂无分享,去创建一个
Di He | Guolin Ke | Tie-Yan Liu | Chengxuan Ying | Tie-Yan Liu | Di He | Guolin Ke | Chengxuan Ying
[1] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[3] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[6] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.
[7] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[8] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[9] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Di He,et al. Efficient Training of BERT by Progressively Stacking , 2019, ICML.
[12] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[13] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[14] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[15] Jingbo Zhu,et al. Sharing Attention Weights for Fast Transformer , 2019, IJCAI.
[16] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[17] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[18] Jesse Vig,et al. A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.
[19] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[20] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[21] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[22] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.