暂无分享,去创建一个
[1] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[2] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[3] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[4] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[5] Omer Levy,et al. SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.
[6] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.
[7] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[8] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[9] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[10] Myle Ott,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences , 2019, Proceedings of the National Academy of Sciences.
[11] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[12] Kilian Q. Weinberger,et al. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[14] Jimmy J. Lin,et al. Distilling Task-Specific Knowledge from BERT into Simple Neural Networks , 2019, ArXiv.
[15] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[16] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[17] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[18] Kyunghyun Cho,et al. Passage Re-ranking with BERT , 2019, ArXiv.
[19] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[20] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[21] James Henderson,et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.
[22] Diederik P. Kingma,et al. GPU Kernels for Block-Sparse Weights , 2017 .
[23] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[24] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[25] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[28] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[29] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Chang Zhou,et al. CogLTX: Applying BERT to Long Texts , 2020, NeurIPS.
[32] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[33] Omer Levy,et al. BERT for Coreference Resolution: Baselines and Analysis , 2019, EMNLP/IJCNLP.
[34] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[35] Christopher Ré,et al. Low-Memory Neural Network Training: A Technical Report , 2019, ArXiv.
[36] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[37] Wenhu Chen,et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.
[38] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[39] Jeffrey Ling,et al. Matching the Blanks: Distributional Similarity for Relation Learning , 2019, ACL.
[40] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[41] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[42] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[43] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.
[44] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[45] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.