Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation

Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code example and it is one of the common documentation issues as found by several studies. Hence in this paper, we consider automatic code example generation for documentation, a direction less explored by the existing research. We employ Codex, a GPT-3 based model, pre-trained on both natural and programming languages to generate code examples from source code and documentation given as input. Our preliminary investigation on 40 scikit-learn methods reveals that this approach is able to generate good code examples where 72.5% code examples were executed without error (passability) and 82.5% properly dealt with the target method and documentation (relevance). We also find that incorporation of error logs (produced by the compiler while executing a failed code example) in the input further improves the passability from 72.5% to 87.5%. Thus, our investigation sets the base of documentation-specific code example generation and warrants in-depth future studies.

[1]  Junaed Younus Khan,et al.  Automatic Code Documentation Generation Using GPT-3 , 2022, ASE.

[2]  Shuvendu K. Lahiri,et al.  Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper) , 2022, ISSTA.

[3]  Frank F. Xu,et al.  DocPrompting: Generating Code by Retrieving the Docs , 2022, ICLR.

[4]  Toufique Ahmed,et al.  Few-shot training LLMs for project-specific code-summarization , 2022, ASE.

[5]  Immanuel Trummer CodexDB , 2022, Proceedings of the VLDB Endowment.

[6]  Frank F. Xu,et al.  A systematic evaluation of large language models of code , 2022, MAPS@PLDI.

[7]  Brett A. Becker,et al.  The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming , 2022, ACE.

[8]  Immanuel Trummer CodexDB: Synthesizing Code for Query Processing from Natural Language Instructions using GPT-3 Codex , 2022, Proc. VLDB Endow..

[9]  S. Savarese,et al.  A Conversational Paradigm for Program Synthesis , 2022, ArXiv.

[10]  Romain Robbes,et al.  Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs , 2021, ArXiv.

[11]  Yue Wang,et al.  CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation , 2021, EMNLP.

[12]  Kai-Wei Chang,et al.  Retrieval Augmented Code Generation and Summarization , 2021, EMNLP.

[13]  Charles Sutton,et al.  Program Synthesis with Large Language Models , 2021, ArXiv.

[14]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[15]  Namit Katariya,et al.  Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization , 2021, NLPMC.

[16]  Hieu Tran,et al.  CoTexT: Multi-task Learning with Code-Text Transformer , 2021, NLP4PROG.

[17]  Kai-Wei Chang,et al.  Unified Pre-training for Program Understanding and Generation , 2021, NAACL.

[18]  Junaed Younus Khan,et al.  Automatic Detection of Five API Documentation Smells: Practitioners’ Perspectives , 2021, 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[19]  Ramesh Karri,et al.  Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs? , 2021, ArXiv.

[20]  Martin Vechev,et al.  TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer , 2021, ICML.

[21]  Luciano Floridi,et al.  GPT-3: Its Nature, Scope, Limits, and Consequences , 2020, Minds and Machines.

[22]  Gabriele Bavota,et al.  Software Documentation: The Practitioners' Perspective , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[23]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[24]  Graham Neubig,et al.  Incorporating External Knowledge through Pre-training for Natural Language to Code Generation , 2020, ACL.

[25]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[26]  Omer Levy,et al.  Structural Language Models of Code , 2019, ICML.

[27]  Mohamed Wiem Mkaouer,et al.  Towards Prioritizing Documentation Effort , 2018, IEEE Transactions on Software Engineering.

[28]  Graham Neubig,et al.  Retrieval-Based Neural Code Generation , 2018, EMNLP.

[29]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[30]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Perspectives on Data Science for Software Engineering.

[31]  Andrew D. Gordon,et al.  Bimodal Modelling of Source Code and Natural Language , 2015, ICML.

[32]  Martin P. Robillard,et al.  How API Documentation Fails , 2015, IEEE Software.

[33]  Premkumar T. Devanbu,et al.  On the localness of software , 2014, SIGSOFT FSE.

[34]  Anh Tuan Nguyen,et al.  A statistical semantic language model for source code , 2013, ESEC/FSE 2013.

[35]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[36]  Martin P. Robillard,et al.  A field study of API learning obstacles , 2011, Empirical Software Engineering.

[37]  David Lorge Parnas,et al.  Precise Documentation: The Key to Better Software , 2010, The Future of Software Engineering.

[38]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[39]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[40]  Timothy Lethbridge,et al.  The relevance of software documentation, tools and technologies: a survey , 2002, DocEng '02.

[41]  Janet Nykaza,et al.  What programmers really want: results of a needs assessment for SDK documentation , 2002, SIGDOC '02.

[42]  Forrest Shull,et al.  Investigating Reading Techniques for Object-Oriented Framework Learning , 2000, IEEE Trans. Software Eng..

[43]  Ian Chai,et al.  Pedagogical framework documentation: how to document object-oriented frameworks. an empirical study , 1999 .

[44]  Hans van der Meij,et al.  A critical assessment of the minimalist approach to documentation , 1992, SIGDOC '92.

[45]  Mary Beth Rosson,et al.  Smalltalk scaffolding: a case study of minimalist instruction , 1990, CHI '90.

[46]  John M. Carroll,et al.  The Minimal Manual , 1987, SGCH.