OctoPack: Instruction Tuning Code Large Language Models
暂无分享,去创建一个
Leandro von Werra | Terry Yue Zhuo | Qi Liu | Qian Liu | Niklas Muennighoff | S. Longpre | Xiangru Tang | Binyuan Hui | Qinkai Zheng | Armel Zebaze | Swayam Singh | Niklas Muennighoff | A. Zebaze
[1] Yiling Lou,et al. ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation , 2023, ArXiv.
[2] Jiaxin Zhang,et al. PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback , 2023, ArXiv.
[3] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[4] James Y. Zou,et al. How is ChatGPT's behavior changing over time? , 2023, ArXiv.
[5] Nelson F. Liu,et al. Lost in the Middle: How Language Models Use Long Contexts , 2023, TACL.
[6] Alham Fikri Aji,et al. Style Over Substance: Evaluation Biases for Large Language Models , 2023, ArXiv.
[7] Yan Zeng,et al. What Matters in Training a GPT4-Style Language Model with Multimodal Inputs? , 2023, ArXiv.
[8] Yew Ken Chia,et al. Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning , 2023, arXiv.org.
[9] Carolyn Jane Anderson,et al. MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation , 2023, IEEE Transactions on Software Engineering.
[10] Shouyuan Chen,et al. Extending Context Window of Large Language Models via Positional Interpolation , 2023, ArXiv.
[11] Harm de Vries,et al. RepoFusion: Training Code Models to Understand Your Repository , 2023, ArXiv.
[12] Can Xu,et al. WizardCoder: Empowering Code Large Language Models with Evol-Instruct , 2023, ArXiv.
[13] Khyathi Raghavi Chandu,et al. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources , 2023, ArXiv.
[14] Carolyn Jane Anderson,et al. StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code , 2023, ArXiv.
[15] Julian McAuley,et al. RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems , 2023, ArXiv.
[16] Han Zhang,et al. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding , 2023, ArXiv.
[17] Nghi D. Q. Bui,et al. CodeTF: One-stop Transformer Library for State-of-the-art Code LLM , 2023, ArXiv.
[18] Zhouchen Lin,et al. Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models , 2023, ArXiv.
[19] Jiayi Wei,et al. Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing , 2023, ArXiv.
[20] Alexander M. Rush,et al. Scaling Data-Constrained Language Models , 2023, ArXiv.
[21] S. Levine,et al. The False Promise of Imitating Proprietary LLMs , 2023, ArXiv.
[22] Luke Zettlemoyer,et al. QLoRA: Efficient Finetuning of Quantized LLMs , 2023, NeurIPS.
[23] Quentin G. Anthony,et al. RWKV: Reinventing RNNs for the Transformer Era , 2023, EMNLP.
[24] Barret Zoph,et al. A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity , 2023, NAACL.
[25] Weizhu Chen,et al. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing , 2023, ArXiv.
[26] Omer Levy,et al. LIMA: Less Is More for Alignment , 2023, NeurIPS.
[27] Dongyeop Kang,et al. CoEdIT: Text Editing by Task-Specific Instruction Tuning , 2023, ArXiv.
[28] Nghi D. Q. Bui,et al. CodeT5+: Open Code Large Language Models for Code Understanding and Generation , 2023, EMNLP.
[29] Harm de Vries,et al. StarCoder: may the source be with you! , 2023, ArXiv.
[30] Yuanhan Zhang,et al. Otter: A Multi-Modal Model with In-Context Instruction Tuning , 2023, ArXiv.
[31] S. Savarese,et al. CodeGen2: Lessons for Training LLMs on Programming and Natural Languages , 2023, ArXiv.
[32] Lingming Zhang,et al. Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation , 2023, ArXiv.
[33] Terry Yue Zhuo. Large Language Models Are State-of-the-Art Evaluators of Code Generation , 2023, ArXiv.
[34] Can Xu,et al. WizardLM: Empowering Large Language Models to Follow Complex Instructions , 2023, ArXiv.
[35] Luiza Amador Pozzobon,et al. On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research , 2023, EMNLP.
[36] Stella Rose Biderman,et al. Emergent and Predictable Memorization in Large Language Models , 2023, NeurIPS.
[37] Yong Jae Lee,et al. Visual Instruction Tuning , 2023, ArXiv.
[38] Ge Li,et al. Self-collaboration Code Generation via ChatGPT , 2023, ACM Transactions on Software Engineering and Methodology.
[39] Zhi Rui Tam,et al. OpenAssistant Conversations - Democratizing Large Language Model Alignment , 2023, ArXiv.
[40] Xinyun Chen,et al. Teaching Large Language Models to Self-Debug , 2023, ArXiv.
[41] Shuvendu K. Lahiri,et al. Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions , 2023, ArXiv.
[42] Julian Aron Prenner,et al. RunBugRun - An Executable Dataset for Automated Program Repair , 2023, ArXiv.
[43] Oskar van der Wal,et al. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , 2023, ICML.
[44] Zhilin Yang,et al. CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X , 2023, ArXiv.
[45] Bodhisattwa Prasad Majumder,et al. Self-Refine: Iterative Refinement with Self-Feedback , 2023, NeurIPS.
[46] Dan Iter,et al. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment , 2023, EMNLP.
[47] L. B. Kristensen,et al. Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting , 2023, ArXiv.
[48] Yi Mao,et al. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation , 2023, ArXiv.
[49] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.
[50] David Ifeoluwa Adelani,et al. The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset , 2023, NeurIPS.
[51] Dragomir R. Radev,et al. LEVER: Learning to Verify Language-to-Code Generation with Execution , 2023, ICML.
[52] Graham Neubig,et al. Learning Performance-Improving Code Edits , 2023, ArXiv.
[53] Nan Jiang,et al. Impact of Code Language Models on Automated Program Repair , 2023, 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
[54] Sumit Agarwal,et al. CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code , 2023, ArXiv.
[55] Pengfei Liu,et al. GPTScore: Evaluate as You Desire , 2023, NAACL.
[56] J. Malmaud,et al. Measuring The Impact Of Programming Language Distribution , 2023, ICML.
[57] Tao Xie,et al. CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models , 2023, ArXiv.
[58] Quoc V. Le,et al. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning , 2023, ICML.
[59] Lingming Zhang,et al. Conversational Automated Program Repair , 2023, ArXiv.
[60] J. Petke,et al. An Analysis of the Automatic Bug Fixing Performance of ChatGPT , 2023, 2023 IEEE/ACM International Workshop on Automated Program Repair (APR).
[61] Harm de Vries,et al. SantaCoder: don't reach for the stars! , 2023, ArXiv.
[62] Xi Victoria Lin,et al. OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization , 2022, ArXiv.
[63] Ying Shen,et al. MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning , 2022, ACL.
[64] Noah A. Smith,et al. Self-Instruct: Aligning Language Models with Self-Generated Instructions , 2022, ACL.
[65] Graham Neubig,et al. Execution-Based Evaluation for Open-Domain Code Generation , 2022, EMNLP.
[66] Wasi Uddin Ahmad,et al. CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context , 2022, ArXiv.
[67] Xi Victoria Lin,et al. Training Trajectories of Language Models Across Scales , 2022, ACL.
[68] Dragomir R. Radev,et al. BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting , 2022, ACL.
[69] Sida I. Wang,et al. Coder Reviewer Reranking for Code Generation , 2022, ICML.
[70] Todd Mytkowicz,et al. CodeExp: Explanatory Code Document Generation , 2022, Conference on Empirical Methods in Natural Language Processing.
[71] Harm de Vries,et al. The Stack: 3 TB of permissively licensed source code , 2022, ArXiv.
[72] Jamie Callan,et al. PAL: Program-aided Language Models , 2022, ICML.
[73] Luke Zettlemoyer,et al. DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation , 2022, ICML.
[74] Guillem Cucurull,et al. Galactica: A Large Language Model for Science , 2022, ArXiv.
[75] Alexander M. Rush,et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.
[76] E. Tuzun,et al. Assessing the quality of GitHub copilot’s code generation , 2022, PROMISE.
[77] Jimmy Ba,et al. Large Language Models Are Human-Level Prompt Engineers , 2022, ICLR.
[78] Sujan Kumar Gonugondla,et al. Multi-lingual Evaluation of Code Generation Models , 2022, ICLR.
[79] Zheng Xin Yong,et al. What Language Model to Train if You Have One Million GPU Hours? , 2022, EMNLP.
[80] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[81] Nils Reimers,et al. MTEB: Massive Text Embedding Benchmark , 2022, EACL.
[82] Xiaofei Xie,et al. TransRepair: Context-aware Program Repair for Compilation Errors , 2022, ASE.
[83] Sumit Gulwani,et al. Repairing Bugs in Python Assignments Using Large Language Models , 2022, ArXiv.
[84] Edouard Grave,et al. EditEval: An Instruction-Based Benchmark for Text Improvements , 2022, ArXiv.
[85] Edouard Grave,et al. PEER: A Collaborative Language Model , 2022, ICLR.
[86] Tom B. Brown,et al. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.
[87] Junyi Jessy Li,et al. CoditT5: Pretraining for Source Code and Natural Language Editing , 2022, International Conference on Automated Software Engineering.
[88] J. Schulman,et al. Efficient Training of Language Models to Fill in the Middle , 2022, ArXiv.
[89] Fenia Christopoulou,et al. PanGu-Coder: Program Synthesis with Function-Level Language Modeling , 2022, ArXiv.
[90] Weizhu Chen,et al. CodeT: Code Generation with Generated Tests , 2022, ICLR.
[91] H. Larochelle,et al. Repository-Level Prompt Generation for Large Language Models of Code , 2022, ICML.
[92] Kenneth O. Stanley,et al. Evolution through Large Models , 2022, 2206.08896.
[93] Chandan K. Reddy,et al. XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence , 2022, ArXiv.
[94] Hanghang Tong,et al. Combining Code Context and Fine-grained Code Difference for Commit Message Generation , 2022, Internetware.
[95] Gerard de Melo,et al. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.
[96] Daniel Y. Fu,et al. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.
[97] Graham Neubig,et al. Learning to Model Editing Processes , 2022, EMNLP.
[98] A. Eghbali,et al. CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).
[99] Martin T. Vechev,et al. On Distribution Shift in Learning-based Bug Detectors , 2022, ICML.
[100] Noah A. Smith,et al. Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks , 2022, EMNLP.
[101] Stella Rose Biderman,et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.
[102] Sida I. Wang,et al. InCoder: A Generative Model for Code Infilling and Synthesis , 2022, ICLR.
[103] Tom B. Brown,et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.
[104] S. Savarese,et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.
[105] D. Schuurmans,et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.
[106] Frank F. Xu,et al. MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages , 2022, FINDINGS.
[107] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[108] Frank F. Xu,et al. A systematic evaluation of large language models of code , 2022, MAPS@PLDI.
[109] Junchao Wang,et al. A Survey of Automatic Source Code Summarization , 2022, Symmetry.
[110] Niklas Muennighoff. SGPT: GPT Sentence Embeddings for Semantic Search , 2022, ArXiv.
[111] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.
[112] Gerard de Melo,et al. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation , 2021, Northern European Journal of Language Technology.
[113] David Bieber,et al. Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.
[114] Romain Robbes,et al. Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs , 2021, ArXiv.
[115] M. Lewis,et al. MetaICL: Learning to Learn In Context , 2021, NAACL.
[116] Alexander M. Rush,et al. Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.
[117] Mohit Iyyer,et al. Do Long-Range Language Models Actually Use Long-Range Context? , 2021, EMNLP.
[118] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[119] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[120] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.
[121] Zhongxing Yu,et al. Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size , 2021, ArXiv.
[122] Hongyu Zhang,et al. On the Evaluation of Neural Code Summarization , 2021, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).
[123] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[124] Percy Liang,et al. Break-It-Fix-It: Unsupervised Learning for Program Repair , 2021, ICML.
[125] Tae-Hwan Jung,et al. CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model , 2021, NLP4PROG.
[126] Douwe Kiela,et al. True Few-Shot Learning with Language Models , 2021, NeurIPS.
[127] Dawn Song,et al. Measuring Coding Challenge Competence With APPS , 2021, NeurIPS Datasets and Benchmarks.
[128] Neel Sundaresan,et al. DeepDebug: Fixing Python Bugs Using Stack Traces, Backtranslation, and Code Skeletons , 2021, ArXiv.
[129] Stella Biderman,et al. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .
[130] Neel Sundaresan,et al. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.
[131] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[132] Baishakhi Ray,et al. A Transformer-based Approach for Source Code Summarization , 2020, ACL.
[133] Rishabh Singh,et al. Global Relational Models of Source Code , 2020, ICLR.
[134] Noam Shazeer,et al. Fast Transformer Decoding: One Write-Head is All You Need , 2019, ArXiv.
[135] Satish Chandra,et al. Neural Code Search Evaluation Dataset , 2019, ArXiv.
[136] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[137] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[138] Ehud Reiter,et al. A Structured Review of the Validity of BLEU , 2018, CL.
[139] Graham Neubig,et al. Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).
[140] Matias Martinez,et al. A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark , 2018, 2019 IEEE 1st International Workshop on Intelligent Bug Fixing (IBF).
[141] Armando Solar-Lezama,et al. QuixBugs: a multi-lingual program repair benchmark set based on the quixey challenge , 2017, SPLASH.
[142] Rico Sennrich,et al. A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation , 2017, IJCNLP.
[143] Natalie Schluter,et al. The limits of automatic summarisation according to ROUGE , 2017, EACL.
[144] Alvin Cheung,et al. Summarizing Source Code using a Neural Attention Model , 2016, ACL.
[145] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[146] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[147] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[148] Dragomir R. Radev,et al. Crosslingual Generalization through Multitask Finetuning , 2023, ACL.
[149] Shafiq R. Joty,et al. xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval , 2023, ArXiv.
[150] P. Pantel,et al. The Hateful Memes Challenge: Competition Report , 2020, NeurIPS.
[151] Jean Carletta,et al. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.