Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation

Code generation aims to automatically generate source code from high-level task specifications, which can significantly increase productivity of software engineering. Recently, approaches based on large language models (LLMs) have shown remarkable code generation abilities on simple tasks. However, generate code for more complex tasks, such as competition-level problems, remains challenging. In this paper, we introduce Brainstorm framework for code generation. It leverages a brainstorming step that generates and selects diverse thoughts on the problem to facilitate algorithmic reasoning, where the thoughts are possible blueprint of solving the problem. We demonstrate that Brainstorm significantly enhances the ability of LLMs to solve competition-level programming problems, resulting in a more than 50% increase in the pass@$k$ metrics for ChatGPT on the CodeContests benchmark, achieving state-of-the-art performance. Furthermore, our experiments conducted on LeetCode contests show that our framework boosts the ability of ChatGPT to a level comparable to that of human programmers.

[1]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[2]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[3]  Weizhu Chen,et al.  CodeT: Code Generation with Generated Tests , 2022, ICLR.

[4]  Akhilesh Deepak Gotmare,et al.  CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning , 2022, NeurIPS.

[5]  Shuvendu K. Lahiri,et al.  Fault-Aware Neural Code Rankers , 2022, NeurIPS.

[6]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[7]  Sida I. Wang,et al.  InCoder: A Generative Model for Code Infilling and Synthesis , 2022, ICLR.

[8]  S. Savarese,et al.  CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis , 2022, ICLR.

[9]  Ming Zhou,et al.  UniXcoder: Unified Cross-Modal Pre-training for Code Representation , 2022, ACL.

[10]  Frank F. Xu,et al.  A systematic evaluation of large language models of code , 2022, MAPS@PLDI.

[11]  Cherepanov,et al.  Competition-level code generation with AlphaCode , 2022, Science.

[12]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[13]  Yue Wang,et al.  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.

[14]  Charles Sutton,et al.  Program Synthesis with Large Language Models , 2021, ArXiv.

[15]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[16]  Nan Duan,et al.  Learning to Complete Code with Sketches , 2021, ICLR.

[17]  Dawn Song,et al.  Measuring Coding Challenge Competence With APPS , 2021, NeurIPS Datasets and Benchmarks.

[18]  Stella Biderman,et al.  GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , 2021 .

[19]  Neel Sundaresan,et al.  CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[20]  Hongyu Li,et al.  Learning Autocompletion from Real-World Datasets , 2020, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[21]  Neel Sundaresan,et al.  PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers , 2020, EMNLP.

[22]  Atsushi Fujita,et al.  Investigating Softmax Tempering for Training Neural Machine Translation Models , 2020, MTSUMMIT.

[23]  Ming Zhou,et al.  GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[24]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[25]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[26]  Aditya Kanade,et al.  Learning and Evaluating Contextual Embedding of Source Code , 2019, ICML.

[27]  Lili Mou,et al.  TreeGen: A Tree-Based Transformer Architecture for Code Generation , 2019, AAAI.

[28]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[29]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[30]  Percy Liang,et al.  SPoC: Search-based Pseudocode to Code , 2019, NeurIPS.

[31]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Pushmeet Kohli,et al.  RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[34]  Swarat Chaudhuri,et al.  Neural Sketch Learning for Conditional Program Generation , 2017, ICLR.

[35]  Martin T. Vechev,et al.  Probabilistic model for code with decision trees , 2016, OOPSLA.

[36]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[37]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[38]  Zohar Manna,et al.  Toward automatic program synthesis , 1971, Symposium on Semantics of Algorithmic Languages.

[39]  C. Cordell Green,et al.  Application of Theorem Proving to Problem Solving , 1969, IJCAI.

[40]  Noah D. Goodman,et al.  Parsel: A Unified Natural Language Framework for Algorithmic Reasoning , 2022, arXiv.org.

[41]  He He,et al.  Text Generation by Learning from Demonstrations , 2020, ArXiv.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[44]  Alvin Cheung,et al.  Mapping Language to Code in Programmatic Context , 2018, EMNLP.