Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
暂无分享,去创建一个
[1] Yiming Yang,et al. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision , 2023, NeurIPS.
[2] Alexander J. Ratner,et al. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes , 2023, ACL.
[3] Lucie Charlotte Magister,et al. Teaching Small Language Models to Reason , 2022, ACL.
[4] Alan Ritter,et al. Stanceosaurus: Classifying Stance Towards Multicultural Misinformation , 2022, EMNLP.
[5] Alan Ritter,et al. Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts , 2022, EMNLP.
[6] Wei Xu,et al. Improving Large-scale Paraphrase Acquisition and Generation , 2022, EMNLP.
[7] Graham Neubig,et al. Prompt Consistency for Zero-Shot Task Generalization , 2022, EMNLP.
[8] Danqi Chen,et al. Structured Pruning Learns Compact and Accurate Models , 2022, Annual Meeting of the Association for Computational Linguistics.
[9] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[10] Julian McAuley,et al. A Survey on Model Compression and Acceleration for Pretrained Language Models , 2022, AAAI.
[11] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[12] Navid Rekabsaz,et al. WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models , 2021, NAACL.
[13] M. Lewis,et al. Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models , 2021, NAACL.
[14] Samuel R. Bowman,et al. Clean or Annotate: How to Spend a Limited Data Collection Budget , 2021, DEEPLO.
[15] Sebastian Riedel,et al. A Few More Examples May Be Worth Billions of Parameters , 2021, EMNLP.
[16] André F. T. Martins,et al. Predicting Attention Sparsity in Transformers , 2021, SPNLP.
[17] Alan Ritter,et al. Pre-train or Annotate? Domain Adaptation with a Constrained Budget , 2021, EMNLP.
[18] Chengyue Gong,et al. Learning with Different Amounts of Annotation: From Zero to Many Labels , 2021, EMNLP.
[19] Shuohang Wang,et al. Want To Reduce Labeling Cost? GPT-3 Can Help , 2021, EMNLP.
[20] Luke Zettlemoyer,et al. Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ACL.
[21] Andrew Gordon Wilson,et al. Does Knowledge Distillation Really Work? , 2021, NeurIPS.
[22] Julian McAuley,et al. BERT Learns to Teach: Knowledge Distillation with Meta Learning , 2021, ACL.
[23] Pradyumna Tambwekar,et al. Towards a Comprehensive Understanding and Accurate Evaluation of Societal Biases in Pre-Trained Transformers , 2021, NAACL.
[24] J. Malmaud,et al. Pareto-Optimal Quantized ResNet Is Mostly 4-bit , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[25] David R. So,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[26] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[27] Omer Levy,et al. How to Train BERT with an Academic Budget , 2021, EMNLP.
[28] Regina Barzilay,et al. Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence , 2021, NAACL.
[29] Dan Hendrycks,et al. CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review , 2021, NeurIPS Datasets and Benchmarks.
[30] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[31] Tom Henighan,et al. Scaling Laws for Transfer , 2021, ArXiv.
[32] Sung-Hyon Myaeng,et al. Handling Anomalies of Synthetic Questions in Unsupervised Question Answering , 2020, COLING.
[33] Johan S. Obando-Ceron,et al. Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2020, ICML.
[34] Sung-Hyon Myaeng,et al. Regularization of Distinct Strategies for Unsupervised Question Generation , 2020, FINDINGS.
[35] Alan Ritter,et al. WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols , 2020, WNUT.
[36] Nitish Shirish Keskar,et al. Unsupervised Paraphrasing with Pretrained Language Models , 2020, EMNLP.
[37] Emily Denton,et al. Characterising Bias in Compressed Models , 2020, ArXiv.
[38] Nicola De Cao,et al. KILT: a Benchmark for Knowledge Intensive Language Tasks , 2020, NAACL.
[39] Ming-Wei Chang,et al. Retrieval Augmented Language Model Pre-Training , 2020, ICML.
[40] Jianping Gou,et al. Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.
[41] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[42] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[43] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[44] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[45] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[46] Xin Jiang,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2019, FINDINGS.
[47] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[48] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[49] Dan Roth,et al. Partial Or Complete, That’s The Question , 2019, NAACL.
[50] Kaisheng Ma,et al. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[51] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[52] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[53] Andreas Vlachos,et al. FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.
[54] Jianfeng Gao,et al. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset , 2016, CoCo@NIPS.
[55] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[56] K. Lancaster. A New Approach to Consumer Theory , 1966, Journal of Political Economy.
[57] Andy Way,et al. Knowledge Distillation for Sustainable Neural Machine Translation , 2022, AMTA.
[58] A. Tefas,et al. Knowledge distillation , 2022, Deep Learning for Robot Perception and Cognition.
[59] Jin Wang,et al. Knowledge Distillation with Reptile Meta-Learning for Pretrained Language Model Compression , 2022, COLING.
[60] Hwaran Lee,et al. Why Knowledge Distillation Amplifies Gender Bias and How to Mitigate from the Perspective of DistilBERT , 2022, GEBNLP.
[61] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[62] P. Chiappori,et al. A Theory of the Allocation of Time " , 2014 .