Token Dropping for Efficient BERT Pretraining
暂无分享,去创建一个
[1] Richard Yuanzhe Pang,et al. Amortized Noisy Channel Neural Machine Translation , 2021, INLG.
[2] Richard Yuanzhe Pang,et al. QuALITY: Question Answering with Long Input Texts, Yes! , 2021, NAACL.
[3] Hao Tian,et al. ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation , 2021, ArXiv.
[4] Shuaiqiang Wang,et al. Pre-trained Language Model based Ranking in Baidu Search , 2021, KDD.
[5] Chen Liang,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[6] Zheng Cao,et al. Reducing BERT Computation by Padding Removal and Curriculum Learning , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] Gao Huang,et al. Dynamic Neural Networks: A Survey , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[9] Minjia Zhang,et al. Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping , 2020, NeurIPS.
[10] Ion Androutsopoulos,et al. LEGAL-BERT: “Preparing the Muppets for Court’” , 2020, FINDINGS.
[11] Tie-Yan Liu,et al. Taking Notes on the Fly Helps BERT Pre-training , 2020, ArXiv.
[12] Guokun Lai,et al. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.
[13] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[14] Lifu Tu,et al. ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation , 2020, ACL.
[15] Fandong Meng,et al. Faster Depth-Adaptive Transformers , 2020, AAAI.
[16] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[17] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[18] Hazem M. Hajj,et al. AraBERT: Transformer-based Model for Arabic Language Understanding , 2020, OSACT.
[19] Lifu Tu,et al. Improving Joint Training of Inference Networks and Structured Prediction Energy Networks , 2019, SPNLP.
[20] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[21] Michael Auli,et al. Depth-Adaptive Transformer , 2019, ICLR.
[22] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[23] M. Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[24] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[25] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[26] Di He,et al. Efficient Training of BERT by Progressively Stacking , 2019, ICML.
[27] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[28] Samy Bengio,et al. Are All Layers Created Equal? , 2019, J. Mach. Learn. Res..
[29] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[30] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..
[31] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[32] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[33] Yun Chen,et al. A Stable and Effective Learning Strategy for Trainable Greedy Decoding , 2018, EMNLP.
[34] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[35] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[36] Victor O. K. Li,et al. Non-Autoregressive Neural Machine Translation , 2017, ICLR.
[37] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[38] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[39] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[40] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[41] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[42] Chen Xing,et al. Taking Notes on the Fly Helps Language Pre-Training , 2021, ICLR.
[43] M. M. Krell,et al. Packing: Towards 2x NLP BERT Acceleration , 2021, ArXiv.
[44] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.