AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
暂无分享,去创建一个
Kang Min Yoo | Se Jung Kwon | Jin-Hwa Kim | Dongsoo Lee | Byeongwook Kim | Jeonghoon Kim | Jung-Woo Ha | Baeseong Park | Nako Sung | Jeongin Bae | S. Kwon
[1] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[2] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[3] Qun Liu,et al. HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks , 2022, ACL.
[4] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.
[5] Tom B. Brown,et al. Predictability and Surprise in Large Generative Models , 2022, FAccT.
[6] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[7] Han Hu,et al. SimMIM: a Simple Framework for Masked Image Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Brian Lester,et al. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , 2021, ACL.
[10] Graham Neubig,et al. Towards a Unified View of Parameter-Efficient Transfer Learning , 2021, ICLR.
[11] Tijmen Blankevoort,et al. Understanding and Overcoming the Challenges of Efficient Transformer Quantization , 2021, EMNLP.
[12] Kyungduk Kim,et al. What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers , 2021, EMNLP.
[13] Minlie Huang,et al. PPT: Pre-trained Prompt Tuning for Few-shot Learning , 2021, ACL.
[14] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[15] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[16] S. Riedel,et al. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.
[17] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[18] Stella Biderman,et al. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .
[19] Zhengxiao Du,et al. GPT Understands, Too , 2021, AI Open.
[20] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[21] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.
[22] Siu Cheung Hui,et al. Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters , 2021, ICLR.
[23] Laria Reynolds,et al. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm , 2021, CHI Extended Abstracts.
[24] Yang Yang,et al. BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction , 2021, ICLR.
[25] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.
[26] Daniel M. Roy,et al. Pruning Neural Networks at Initialization: Why are We Missing the Mark? , 2020, International Conference on Learning Representations.
[27] Yoonjung Choi,et al. Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation , 2020, FINDINGS.
[28] Tao Yu,et al. DART: Open-Domain Structured Data Record to Text Generation , 2020, NAACL.
[29] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[30] Ron Banner,et al. Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming , 2020, ArXiv.
[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[32] Dongsoo Lee,et al. BiQGEMM: Matrix Multiplication with Lookup Table for Binary-Coding-Based Quantized DNNs , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] David J. Schwab,et al. Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.
[34] Dan Klein,et al. Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers , 2020, ArXiv.
[35] Aleksander Wawer,et al. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.
[36] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[37] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[38] Kushal Datta,et al. Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model , 2019, ArXiv.
[39] Steven K. Esser,et al. Learned Step Size Quantization , 2019, ICLR.
[40] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[41] Zhiru Zhang,et al. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting , 2019, ICML.
[42] Hongbin Zha,et al. Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.
[43] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[45] Claire Gardent,et al. The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.
[46] Verena Rieser,et al. The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.
[47] Yurong Chen,et al. Network Sketching: Exploiting Binary Structure in Deep CNNs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[49] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[50] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[51] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[52] Se Jung Kwon,et al. nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models , 2022, ArXiv.
[53] Joe Davison,et al. Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , 2021, NeurIPS.
[54] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.
[55] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[56] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .