Data mining source code to facilitate program comprehension: experiments on clustering data retrieved from C++ programs

This paper presents ongoing work on using data mining to discover knowledge about software systems thus facilitating program comprehension. We discuss how this work fits in the context of tool supported maintenance and comprehension and report on applying a new methodology on C++ programs. The overall framework can provide practical insights and guide the maintainer through the specifics of systems, assuming little familiarity with these. The contribution of this work is two-fold: it provides a model and associated method to extract data from C++ source code which is subsequently to be mined, and evaluates a proposed framework for clustering such data to obtain useful knowledge. The methodology is evaluated on three open source applications, results are assessed and conclusions are presented. This paper concludes with directions for future work.

[1]  Keith H. Bennett,et al.  From system comprehension to program comprehension , 2002, Proceedings 26th Annual International Computer Software and Applications.

[2]  Richard C. Holt,et al.  Software botryology. Automatic clustering of software systems , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[3]  Janice Singer,et al.  Understanding program understanding , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[4]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[5]  Thomas M. Pigoski Practical Software Maintenance: Best Practices for Managing Your Software Investment , 1996 .

[6]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[7]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[8]  Doris L. Carver,et al.  Identification of data cohesive subsystems using data mining techniques , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[9]  Panagiotis K. Linos,et al.  A tool for understanding multi-language program dependencies , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[10]  Paul J. Layzell,et al.  Facilitating program comprehension by mining association rules from source code , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[11]  Paul J. Layzell,et al.  Using Data Mining to Assess Sofwtare Reliability , 2001 .

[12]  Paul J. Layzell,et al.  Expert maintainers' strategies and needs when understanding software: a case study approach , 2001, Proceedings Eighth Asia-Pacific Software Engineering Conference.

[13]  Gerardo Canfora,et al.  A workbench for program comprehension during software maintenance , 1996, WPC '96. 4th Workshop on Program Comprehension.

[14]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[15]  Thomas Kunz,et al.  Using Automatic Process Clustering for Design Recovery and Distributed Debugging , 1995, IEEE Trans. Software Eng..

[16]  Farhad Mavaddat,et al.  Architectural design recovery using data mining techniques , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[17]  Anneliese Amschler Andrews,et al.  Program understanding behavior during adaptation of large scale software , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[18]  Christos Tjortjis,et al.  Using Data Mining to Assess Software Reliability , 2004 .