A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source code

Abstract Context: Software systems evolve over time to meet the new requirements of users. These new requirements, usually, are not reflected in the original documents of these software systems. Therefore, the new version of a software system deviates from the original and documented architecture. This way, it will be more difficult to understand it after a while and it will be difficult to make new changes conveniently. Clustering techniques are used to extract the architecture of a software system in order to understand it. An artifact dependency graph (ADG) is often used for clustering, which is extracted from a source code. In the literature, some hierarchical and search-based clustering methods have been presented to extract the software architecture. Hierarchical algorithms have reasonable search time; however, they are not able to find a good architecture. In contrast, search-based algorithms are often better in this regard; however, their time and space limitations make them useless in practice for large-scale software systems. Both hierarchical and search-based clustering methods overlook the existing knowledge in an ADG for clustering. Objective: To overcome the limitations of the existing clustering methods, this paper presents a new deterministic clustering algorithm named Neighborhood tree algorithm. Method: The new algorithm creates a neighborhood tree using available knowledge in an ADG and uses this tree for clustering. Results: Our initial results indicate that the algorithm is better able to extract an acceptable architecture in a reasonable time, compared with hierarchical and search-based algorithms. Conclusions: The proposed clustering algorithm is expected to greatly assist software engineers in extracting meaningful and understandable subsystems from a source code.

[1]  Jitender Kumar Chhabra,et al.  Harmony search based remodularization for object-oriented software systems , 2017, Comput. Lang. Syst. Struct..

[2]  Saeed Parsa,et al.  A New Encoding Scheme and a Framework to Investigate Genetic Clustering Algorithms , 2005, J. Res. Pract. Inf. Technol..

[3]  Fabian Beck,et al.  On the impact of software evolution on software clustering , 2012, Empirical Software Engineering.

[4]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[5]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[6]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[7]  Jing Liu,et al.  A similarity-based modularization quality measure for software module clustering problems , 2016, Inf. Sci..

[8]  Nenad Medvidovic,et al.  Obtaining ground-truth software architectures , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Stéphane Ducasse,et al.  Software Architecture Reconstruction: A Process-Oriented Taxonomy , 2009, IEEE Transactions on Software Engineering.