Clustering data retrieved from Java source code to support software maintenance: a case study

Data mining is a technology recently used in support of software maintenance in various contexts. Our works focuses on achieving a high level understanding of Java systems without prior familiarity with these. Our thesis is that system structure and interrelationships, as well as similarities among program components can be derived by applying cluster analysis on data extracted from source code. This paper proposes a methodology suitable for Java code analysis. It comprises of a Java code analyser which examines programs and constructs tables representing code syntax, and a clustering engine which operates on such tables and identifies relationships among code elements. We evaluate the methodology on a medium sized system, present initial results and discuss directions for further work.

[1]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[2]  Thomas M. Pigoski Practical Software Maintenance: Best Practices for Managing Your Software Investment , 1996 .

[3]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[4]  Doris L. Carver,et al.  Identification of data cohesive subsystems using data mining techniques , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[5]  Richard C. Holt,et al.  ACCD: an algorithm for comprehension-driven clustering , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[6]  Ian Witten,et al.  Data Mining , 2000 .

[7]  Keith H. Bennett,et al.  From system comprehension to program comprehension , 2002, Proceedings 26th Annual International Computer Software and Applications.

[8]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[9]  Farhad Mavaddat,et al.  Architectural design recovery using data mining techniques , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[10]  Pedro Manuel Moreira Vaz Antunes de Sousa,et al.  Proceedings of the Fifth European Conference on Software Maintenance and Reengineering , 2000 .

[11]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Yiannis Kanellopoulos,et al.  Data mining source code to facilitate program comprehension: experiments on clustering data retrieved from C++ programs , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[13]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).