Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
暂无分享,去创建一个
[1] Y. Choi,et al. Balancing Lexical and Semantic Quality in Abstractive Summarization , 2023, ACL.
[2] Wei Wu,et al. Robust Lottery Tickets for Pre-trained Language Models , 2022, ACL.
[3] Hyung Won Chung,et al. Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? , 2022, EMNLP.
[4] Kentaro Inui,et al. Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model , 2022, BIGSCIENCE.
[5] Qun Liu,et al. Exploring Extreme Parameter Compression for Pre-trained Language Models , 2022, ICLR.
[6] T. Zhao,et al. CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing , 2022, ACL.
[7] Christos Tsirigotis,et al. Simplicial Embeddings in Self-Supervised Learning and Downstream Classification , 2022, ICLR.
[8] Richard Yuanzhe Pang,et al. Token Dropping for Efficient BERT Pretraining , 2022, ACL.
[9] Alessandro Moschitti,et al. Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems , 2022, EMNLP.
[10] Jun Huang,et al. From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression , 2021, AAAI.
[11] Jacob Eisenstein,et al. The MultiBERTs: BERT Reproductions for Robustness Analysis , 2021, ICLR.
[12] E. Chng,et al. An Embarrassingly Simple Model for Dialogue Relation Extraction , 2020, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Xiaodong Liu,et al. AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models , 2022, arXiv.org.
[14] Gang Chen,et al. SkipBERT: Efficient Inference with Shallow Layer Skipping , 2022, ACL.
[15] Xuanjing Huang,et al. Flooding-X: Improving BERT’s Resistance to Adversarial Attacks via Loss-Restricted Fine-Tuning , 2022, ACL.
[16] Jacob Eisenstein,et al. Sparse, Dense, and Attentional Representations for Text Retrieval , 2020, Transactions of the Association for Computational Linguistics.
[17] Nikita Nangia,et al. Scaling Laws vs Model Architectures : How does Inductive Bias Influence Scaling? An Extensive Empirical Study on Language Tasks , 2021 .
[18] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[19] Yuanzhi Li,et al. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning , 2020, ICLR.
[20] Frank Rudzicz,et al. On Losses for Modern Language Models , 2020, EMNLP.
[21] Dan Iter,et al. Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models , 2020, ACL.
[22] M. Zaharia,et al. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.
[23] Xipeng Qiu,et al. Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation , 2020, Journal of Computer Science and Technology.
[24] Dustin Tran,et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.
[25] Hao Tian,et al. ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.
[26] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[27] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[28] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[29] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[30] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[31] Dan Roth,et al. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.
[32] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[33] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.
[34] Honglak Lee,et al. An efficient framework for learning sentence representations , 2018, ICLR.
[35] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[38] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[39] Zornitsa Kozareva,et al. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.
[40] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.
[41] Daniel Allen,et al. The transformer. , 2000, Nursing standard (Royal College of Nursing (Great Britain) : 1987).
[42] David Mackay,et al. Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .
[43] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .