Code Llama: Open Foundation Models for Code
暂无分享,去创建一个
Manish P Bhatt | Jonas Gehring | Nicolas Usunier | Gabriel Synnaeve | Baptiste Rozière | J. Rapin | I. Evtimov | Yossi Adi | Tal Remez | Hugo Touvron | Wenhan Xiong | Xiaoqing Tan | Joanna Bitton | Louis Martin | Itai Gat | F. Azhar | Jade Copet | Thomas Scialom | Sten Sootla | Cristian Cantón Ferrer | Fabian Gloeckle | Jingyu Liu | Artyom Kozhevnikov | Aaron Grattafiori | Alexandre D'efossez | Jingyu Liu
[1] Nicolas Papernot,et al. LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? , 2023, ArXiv.
[2] Eric Michael Smith,et al. Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.
[3] Deheng Ye,et al. RLTF: Reinforcement Learning from Unit Test Feedback , 2023, arXiv.org.
[4] Nelson F. Liu,et al. Lost in the Middle: How Language Models Use Long Contexts , 2023, TACL.
[5] Li Dong,et al. LongNet: Scaling Transformers to 1, 000, 000, 000 Tokens , 2023, ArXiv.
[6] Carolyn Jane Anderson,et al. MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation , 2023, IEEE Transactions on Software Engineering.
[7] Shouyuan Chen,et al. Extending Context Window of Large Language Models via Positional Interpolation , 2023, ArXiv.
[8] Julian McAuley,et al. LongCoder: A Long-Range Pre-trained Language Model for Code Completion , 2023, ICML.
[9] Harkirat Singh Behl,et al. Textbooks Are All You Need , 2023, ArXiv.
[10] Julien Launay,et al. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only , 2023, ArXiv.
[11] Siva Reddy,et al. The Impact of Positional Encoding on Length Generalization in Transformers , 2023, NeurIPS.
[12] Yejin Choi,et al. Impossible Distillation: from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing , 2023, ArXiv.
[13] M. Lewis,et al. MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers , 2023, NeurIPS.
[14] Harm de Vries,et al. StarCoder: may the source be with you! , 2023, ArXiv.
[15] S. Savarese,et al. CodeGen2: Lessons for Training LLMs on Programming and Natural Languages , 2023, ArXiv.
[16] Henrique Pondé de Oliveira Pinto,et al. GPT-4 Technical Report , 2023, 2303.08774.
[17] Nikos Karampatziakis,et al. Meet in the Middle: A New Pre-training Paradigm , 2023, ArXiv.
[18] J. Tenenbaum,et al. Planning with Large Language Models for Code Generation , 2023, ICLR.
[19] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.
[20] Harm de Vries,et al. SantaCoder: don't reach for the stars! , 2023, ArXiv.
[21] Li Dong,et al. A Length-Extrapolatable Transformer , 2022, ACL.
[22] Yanqiao Zhu,et al. A Survey on Pretrained Language Models for Neural Code Intelligence , 2022, ArXiv.
[23] Omer Levy,et al. Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor , 2022, ACL.
[24] Alexander M. Rush,et al. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.
[25] Sumit Gulwani,et al. Repairing Bugs in Python Assignments Using Large Language Models , 2022, ArXiv.
[26] J. Schulman,et al. Efficient Training of Language Models to Fill in the Middle , 2022, ArXiv.
[27] Weizhu Chen,et al. CodeT: Code Generation with Generated Tests , 2022, ICLR.
[28] Akhilesh Deepak Gotmare,et al. CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning , 2022, NeurIPS.
[29] Gabriel Synnaeve,et al. Code Translation with Compiler Representations , 2022, ICLR.
[30] Xi Victoria Lin,et al. OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.
[31] Julian Aron Prenner,et al. Can OpenAI's Codex Fix Bugs?: An evaluation on QuixBugs , 2022, 2022 IEEE/ACM International Workshop on Automated Program Repair (APR).
[32] Stella Rose Biderman,et al. GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.
[33] Sida I. Wang,et al. InCoder: A Generative Model for Code Infilling and Synthesis , 2022, ICLR.
[34] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[35] Omer Levy,et al. Transformer Language Models without Positional Encodings Still Learn Positional Information , 2022, EMNLP.
[36] Lisa Anne Hendricks,et al. Training Compute-Optimal Large Language Models , 2022, ArXiv.
[37] S. Savarese,et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.
[38] Dipankar Ray,et al. ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection , 2022, ACL.
[39] Ryan J. Lowe,et al. Training language models to follow instructions with human feedback , 2022, NeurIPS.
[40] Cherepanov,et al. Competition-level code generation with AlphaCode , 2022, Science.
[41] Dmytro Okhonko,et al. CM3: A Causal Masked Multimodal Model of the Internet , 2022, ArXiv.
[42] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[43] Mohammad Bavarian,et al. Training Verifiers to Solve Math Word Problems , 2021, ArXiv.
[44] Gabriel Synnaeve,et al. Leveraging Automated Unit Tests for Unsupervised Code Translation , 2021, ICLR.
[45] Owain Evans,et al. TruthfulQA: Measuring How Models Mimic Human Falsehoods , 2021, ACL.
[46] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.
[47] Yue Wang,et al. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.
[48] Noah A. Smith,et al. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.
[49] Charles Sutton,et al. Program Synthesis with Large Language Models , 2021, ArXiv.
[50] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[51] Percy Liang,et al. Break-It-Fix-It: Unsupervised Learning for Program Repair , 2021, ICML.
[52] Dawn Song,et al. Measuring Coding Challenge Competence With APPS , 2021, NeurIPS Datasets and Benchmarks.
[53] Jianlin Su,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.
[54] Stella Biderman,et al. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .
[55] Kai-Wei Chang,et al. Unified Pre-training for Program Understanding and Generation , 2021, NAACL.
[56] Guillaume Lample,et al. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages , 2021, NeurIPS.
[57] Neel Sundaresan,et al. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.
[58] Kai-Wei Chang,et al. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.
[59] Ming Zhou,et al. GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.
[60] Neel Sundaresan,et al. Unit Test Case Generation with Transformers , 2020, ArXiv.
[61] Joseph E. Gonzalez,et al. Contrastive Code Representation Learning , 2020, EMNLP.
[62] Guillaume Lample,et al. Unsupervised Translation of Programming Languages , 2020, NeurIPS.
[63] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[64] Ting Liu,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.
[65] Andrew Rice,et al. Learning to Fix Build Errors with Graph2Diff Neural Networks , 2019, ICSE.
[66] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[67] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[68] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[69] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[70] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[71] Miltiadis Allamanis,et al. The adverse effects of code duplication in machine learning models of code , 2018, Onward!.
[72] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.
[73] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[74] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[75] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[76] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[77] Eric Gilbert,et al. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.
[78] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[79] Ge Li,et al. Integrating Tree Path in Transformer for Code Representation , 2021, NeurIPS.