A Topic Modeling Approach To Evaluate The Comments Consistency To Source Code

A significant amount of source code in software systems is made up of comments, parts of the code that are ignored by the compiler. Comments in the code are a primary source for system documentation. These are crucial for the work of software maintainers, as a basis for code traceability, for maintenance activities, but also for the use of the code itself as a library or framework in other projects. Although many software developers consider comments important, existing approaches to software quality analysis mainly disregard code comments and focus only on source code. This paper presents an approach, based on topic modeling, for analyzing the comments consistency to the source code. A model was provided to analyze the quality of comments in terms of consistency since comments should be consistent with the source code they refer to. The results show a similarity in the trend of topic distribution and it emerges that almost all classes are associated with no more than 3 topics.

[1]  Collin McMillan,et al.  Automatic documentation generation via source code summarization of method context , 2014, ICPC 2014.

[2]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Quan Z. Sheng,et al.  Mining Source Code Topics Through Topic Model and Words Embedding , 2016, ADMA.

[4]  Michael Marcotty,et al.  Improving computer program readability to aid modification , 1982, CACM.

[5]  Diomidis Spinellis,et al.  Code Quality: The Open Source Perspective , 2006 .

[6]  Nicolas Anquetil,et al.  A study of the documentation essential to software maintenance , 2005, SIGDOC '05.

[7]  Carl S. Hartzman,et al.  Maintenance productivity: observations based on an experience in a large system environment , 1993, CASCON.

[8]  Scott N. Woodfield,et al.  The effect of modularization and comments on program comprehension , 1981, ICSE '81.

[9]  Ted Tenny,et al.  Program Readability: Procedures Versus Comments , 1988, IEEE Trans. Software Eng..

[10]  Elmar Jürgens,et al.  Quality analysis of source code comments , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[11]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[12]  Michael L. Van de Vanter,et al.  The documentary structure of source code , 2002, Inf. Softw. Technol..

[13]  Manuel J. Barranco García,et al.  Maintainability as a key factor in maintenance productivity: a case study , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[14]  Giuliano Antoniol,et al.  Information retrieval models for recovering traceability links between code and documentation , 2000, Proceedings 2000 International Conference on Software Maintenance.

[15]  Arun Lakhotia,et al.  Understanding someone else's code: Analysis of experiences , 1993, J. Syst. Softw..

[16]  Collin McMillan,et al.  An empirical study of the textual similarity between source code and source code summaries , 2016, Empirical Software Engineering.

[17]  P. Oman,et al.  Metrics for assessing a software system's maintainability , 1992, Proceedings Conference on Software Maintenance 1992.

[18]  Xiaonan Luo,et al.  Automatically Detecting the Scopes of Source Code Comments , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[19]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..