TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
暂无分享,去创建一个
Maosong Sun | Deming Ye | Yankai Lin | Yufei Huang | Maosong Sun | Yankai Lin | Deming Ye | Yufei Huang
[1] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[2] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.
[3] Robert Frank,et al. Open Sesame: Getting inside BERT’s Linguistic Knowledge , 2019, BlackboxNLP@ACL.
[4] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[5] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[6] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[7] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[8] Peng Zhou,et al. FastBERT: a Self-distilling BERT with Adaptive Inference Time , 2020, ACL.
[9] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[10] Yiming Yang,et al. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.
[11] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Shiyu Chang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.
[14] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[15] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.
[16] Paul N. Bennett,et al. Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention , 2020, ICLR.
[17] Qun Liu,et al. TernaryBERT: Distillation-aware Ultra-low Bit BERT , 2020, EMNLP.
[18] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[19] Mohit Bansal,et al. Revealing the Importance of Semantic Retrieval for Machine Reading at Scale , 2019, EMNLP.
[20] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[21] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[22] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[23] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[24] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[26] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[27] Yu Cheng,et al. Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[29] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[30] Qun Liu,et al. DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.
[31] Jonathan Berant,et al. MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.
[32] Edouard Grave,et al. Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.
[33] Michael W. Mahoney,et al. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.
[34] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[35] Jimmy J. Lin,et al. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.
[36] Niranjan Balasubramanian,et al. DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering , 2020, ACL.
[37] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[38] Hai Zhao,et al. Semantics-aware BERT for Language Understanding , 2020, AAAI.
[39] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[40] Li Zhao,et al. Learning Structured Representation for Text Classification via Reinforcement Learning , 2018, AAAI.
[41] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[42] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[43] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.
[44] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[45] Yimeng Zhuang,et al. Token-level Dynamic Self-Attention Network for Multi-Passage Reading Comprehension , 2019, ACL.
[46] Luyao Huang,et al. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence , 2019, NAACL.
[47] Benno Stein,et al. SemEval-2019 Task 4: Hyperpartisan News Detection , 2019, *SEMEVAL.
[48] Anamitra R. Choudhury,et al. PoWER-BERT: Accelerating BERT inference for Classification Tasks , 2020, ArXiv.
[49] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[50] Eunsol Choi,et al. MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension , 2019, MRQA@EMNLP.
[51] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[52] Kyunghyun Cho,et al. Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search , 2020, ACL.
[53] Guokun Lai,et al. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.