Recovering Concepts from Source Code with Automated Concept Identification

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. Improvements like the widespread use of UML have led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The well known concept assignment problem is still being solved at the line-by-line level of analyzing source code. The introduction of abstraction to the problem will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this paper we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to the core concepts in the domain. This approach can simplify the process of reverse engineering and design recovery, as well as other activities that require a mapping to domain level concepts.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  Toon Calders,et al.  Applying Webmining techniques to execution traces to support the program comprehension process , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[3]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[4]  Arie van Deursen,et al.  Symphony: view-driven software architecture reconstruction , 2004, Proceedings. Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004).

[5]  Michael W. Godfrey,et al.  Concept identification in object-oriented domain analysis: why some students just don't get it , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Renato De Mori,et al.  Feed-forward and recurrent neural networks for source code informal information analysis , 2003, J. Softw. Maintenance Res. Pract..

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[10]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[11]  Brian Henderson-Sellers,et al.  Object-Oriented Metrics , 1995, TOOLS.

[12]  Jean-Marie Favre,et al.  CaCOphoNy: metamodel-driven software architecture reconstruction , 2004, 11th Working Conference on Reverse Engineering.

[13]  Colin Potts,et al.  Ontological excavation: unearthing the core concepts of the application , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..