An empirical study of the textual similarity between source code and source code summaries

Source code documentation often contains summaries of source code written by authors. Recently, automatic source code summarization tools have emerged that generate summaries without requiring author intervention. These summaries are designed for readers to be able to understand the high-level concepts of the source code. Unfortunately, there is no agreed upon understanding of what makes up a “good summary.” This paper presents an empirical study examining summaries of source code written by authors, readers, and automatic source code summarization tools. This empirical study examines the textual similarity between source code and summaries of source code using Short Text Semantic Similarity metrics. We found that readers use source code in their summaries more than authors do. Additionally, this study finds that accuracy of a human written summary can be estimated by the textual similarity of that summary to the source code.

[1]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[2]  Ted J. Biggerstaff,et al.  The concept assignment problem in program understanding , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[3]  Senthil Mani,et al.  AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[4]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[6]  John Mariani,et al.  Opportunistic Reuse: Lessons from Scrapheap Software Development , 2008, CBSE.

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  Simon Coupland,et al.  A fast and efficient semantic short text similarity metric , 2013, 2013 13th UK Workshop on Computational Intelligence (UKCI).

[9]  Anja Guzzi,et al.  Documenting and sharing knowledge about code , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[10]  Douglas Kramer,et al.  API documentation from source code comments: a case study of Javadoc , 1999, SIGDOC '99.

[11]  Collin McMillan,et al.  Automatic documentation generation via source code summarization of method context , 2014, ICPC 2014.

[12]  Brad A. Myers,et al.  An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance Tasks , 2006, IEEE Transactions on Software Engineering.

[13]  Elliot Soloway,et al.  Mental models and software maintenance , 1986, J. Syst. Softw..

[14]  Robert J. Walker,et al.  Systematizing pragmatic software reuse , 2012, TSEM.

[15]  Andrew Begel,et al.  Cognitive Perspectives on the Role of Naming in Computer Programs , 2006, PPIG.

[16]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[17]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[18]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[19]  Westley Weimer,et al.  Automatic documentation inference for exceptions , 2008, ISSTA '08.

[20]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[21]  René Witte,et al.  Automatic Quality Assessment of Source Code Comments: The JavadocMiner , 2010, NLDB.

[22]  Westley Weimer,et al.  Automatically documenting program changes , 2010, ASE.

[23]  Håkan Burden,et al.  Natural language generation from class diagrams , 2011, MoDeVVa.

[24]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[25]  Arun Lakhotia,et al.  Understanding someone else's code: Analysis of experiences , 1993, J. Syst. Softw..

[26]  Ahmed E. Hassan,et al.  On the relationship between comment update practices and Software Bugs , 2012, J. Syst. Softw..

[27]  Janice Singer,et al.  How software engineers use documentation: the state of the practice , 2003, IEEE Software.

[28]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[29]  Janice Singer,et al.  An examination of software engineering work practices , 1997, CASCON.

[30]  Jeffrey C. Carver,et al.  Evaluating source code summarization techniques: Replication and expansion , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[31]  Scott R. Klemmer,et al.  Example-centric programming: integrating web search into the development environment , 2010, CHI.

[32]  Elmar Jürgens,et al.  Quality analysis of source code comments , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[33]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[34]  Brad A. Myers,et al.  A framework and methodology for studying the causes of software errors in programming systems , 2005, J. Vis. Lang. Comput..

[35]  Timothy Lethbridge,et al.  The relevance of software documentation, tools and technologies: a survey , 2002, DocEng '02.

[36]  Dennis Mancl,et al.  Understanding and addressing the essential costs of evolving systems , 2000, Bell Labs Technical Journal.

[37]  Brad A. Myers,et al.  Mica: A Web-Search Tool for Finding API Components and Examples , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[38]  Mira Kajko-Mattsson,et al.  A Survey of Documentation Practice within Corrective Maintenance , 2004, Empirical Software Engineering.

[39]  Rainer Koschke,et al.  How do professional developers comprehend software? , 2012, 2012 34th International Conference on Software Engineering (ICSE).