Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

Large Language Models (LLMs) possess impressive capabilities to generate meaningful code snippets given natural language intents in zero-shot, i.e., without the need for specific fine-tuning. In the perspective of unleashing their full potential, prior work has demonstrated the benefits of fine-tuning the models to task-specific data. However, fine-tuning process demands heavy computational costs and is intractable when resources are scarce, especially for models with billions of parameters. In light of these challenges, previous studies explored In-Context Learning (ICL) as an effective strategy to generate contextually appropriate code without fine-tuning. However, it operates at inference time and does not involve learning task-specific parameters, potentially limiting the model's performance on downstream tasks. In this context, we foresee that Parameter-Efficient Fine-Tuning (PEFT) techniques carry a high potential for efficiently specializing LLMs to task-specific data. In this paper, we deliver a comprehensive study of LLMs with the impact of PEFT techniques under the automated code generation scenario. Our experimental results reveal the superiority and potential of such techniques over ICL on a wide range of LLMs in reducing the computational burden and improving performance. Therefore, the study opens opportunities for broader applications of PEFT in software engineering scenarios.

[1]  M. Tan,et al.  Exploring Continual Learning for Code Generation Models , 2023, ACL.

[2]  Carolyn Jane Anderson,et al.  MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation , 2023, IEEE Transactions on Software Engineering.

[3]  H. Sahraoui,et al.  On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code , 2023, ArXiv.

[4]  Lingming Zhang,et al.  Automated Program Repair in the Era of Large Pre-trained Language Models , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).

[5]  Ali Mesbah,et al.  Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).

[6]  Hao Wang,et al.  Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning , 2023, 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE).

[7]  Xianghua Fu,et al.  Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media , 2023, ArXiv.

[8]  Soujanya Poria,et al.  LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models , 2023, ArXiv.

[9]  F. Fischer,et al.  ChatGPT for good? On opportunities and challenges of large language models for education , 2023, Learning and Individual Differences.

[10]  Boxing Chen,et al.  One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).

[11]  Mei Wang,et al.  Prompting Large Language Models with Answer Heuristics for Knowledge-Based Visual Question Answering , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ge Li,et al.  SkCoder: A Sketch-based Approach for Automatic Code Generation , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).

[13]  Cuiyun Gao,et al.  Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).

[14]  Hyung Won Chung,et al.  Large language models encode clinical knowledge , 2022, Nature.

[15]  Graham Neubig,et al.  Execution-Based Evaluation for Open-Domain Code Generation , 2022, EMNLP.

[16]  X. Zhang,et al.  ExploitGen: Template-augmented exploit code generation based on CodeBERT , 2022, J. Syst. Softw..

[17]  Chris Callison-Burch,et al.  Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Christopher D. Manning,et al.  Holistic Evaluation of Language Models , 2023, Annals of the New York Academy of Sciences.

[19]  Alexander M. Rush,et al.  BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.

[20]  Sujan Kumar Gonugondla,et al.  Multi-lingual Evaluation of Code Generation Models , 2022, ICLR.

[21]  Manuel R. Ciosici,et al.  Efficient Methods for Natural Language Processing: A Survey , 2022, Transactions of the Association for Computational Linguistics.

[22]  Sumit Gulwani,et al.  Repair Is Nearly Generation: Multilingual Program Repair with LLMs , 2022, AAAI.

[23]  Cuiyun Gao,et al.  No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence , 2022, ESEC/SIGSOFT FSE.

[24]  Lingming Zhang,et al.  Less training, more repairing please: revisiting automated program repair via zero-shot learning , 2022, ESEC/SIGSOFT FSE.

[25]  Frank F. Xu,et al.  DocPrompting: Generating Code by Retrieving the Docs , 2022, ICLR.

[26]  Toufique Ahmed,et al.  Few-shot training LLMs for project-specific code-summarization , 2022, ASE.

[27]  H. Larochelle,et al.  Repository-Level Prompt Generation for Large Language Models of Code , 2022, ICML.

[28]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[29]  Juho Leinonen,et al.  Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models , 2022, ICER.

[30]  D. Tuia,et al.  Prompt–RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[32]  Colin Raffel,et al.  Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning , 2022, NeurIPS.

[33]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[34]  Julian Aron Prenner,et al.  Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs , 2022, 2022 IEEE/ACM International Workshop on Automated Program Repair (APR).

[35]  Elena L. Glassman,et al.  Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models , 2022, CHI Extended Abstracts.

[36]  Sida I. Wang,et al.  InCoder: A Generative Model for Code Infilling and Synthesis , 2022, ICLR.

[37]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[38]  S. Savarese,et al.  CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.

[39]  Haitao Zheng,et al.  Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models , 2022, ArXiv.

[40]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[41]  Frank F. Xu,et al.  A systematic evaluation of large language models of code , 2022, MAPS@PLDI.

[42]  Cherepanov,et al.  Competition-level code generation with AlphaCode , 2022, Science.

[43]  Nagarajan Natarajan,et al.  Jigsaw: Large Language Models meet Program Synthesis , 2021, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[44]  Eneko Agirre,et al.  Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey , 2021, ACM Comput. Surv..

[45]  M. Lewis,et al.  MetaICL: Learning to Learn In Context , 2021, NAACL.

[46]  Swarat Chaudhuri,et al.  Neural Program Generation Modulo Static Analysis , 2021, NeurIPS.

[47]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[48]  Chen Change Loy,et al.  Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.

[49]  Ellie Pavlick,et al.  Do Prompt-Based Models Really Understand the Meaning of Their Prompts? , 2021, NAACL.

[50]  Yue Wang,et al.  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.

[51]  Charles Sutton,et al.  Program Synthesis with Large Language Models , 2021, ArXiv.

[52]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[53]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[54]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[55]  Yejin Choi,et al.  VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[57]  Stella Biderman,et al.  GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[58]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[59]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[60]  Miles Brundage,et al.  Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models , 2021, ArXiv.

[61]  Ming Zhou,et al.  CodeBLEU: a Method for Automatic Evaluation of Code Synthesis , 2020, ArXiv.

[62]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[63]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[64]  Lili Mou,et al.  TreeGen: A Tree-Based Transformer Architecture for Code Generation , 2019, AAAI.

[65]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[66]  Omer Levy,et al.  Structural Language Models of Code , 2019, ICML.

[67]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[68]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[69]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[70]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[71]  Graham Neubig,et al.  Retrieval-Based Neural Code Generation , 2018, EMNLP.

[72]  Graham Neubig,et al.  Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[73]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[74]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[75]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[76]  YunSeok Choi,et al.  CodePrompt: Task-Agnostic Prefix Tuning for Program and Language Generation , 2023, ACL.

[77]  Zornitsa Kozareva,et al.  Few-shot Learning with Multilingual Language Models , 2021, ArXiv.

[78]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[79]  Sajad Norouzi,et al.  Code Generation from Natural Language with Less Prior Knowledge and More Monolingual Data , 2021, ACL.

[80]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .