A graph-based clustering algorithm for software systems modularization

Abstract Context: Clustering algorithms, as a modularization technique, are used to modularize a program aiming to understand large software systems as well as software refactoring. These algorithms partition the source code of the software system into smaller and easy-to-manage modules (clusters). The resulting decomposition is called the software system structure (or software architecture). Due to the NP-hardness of the modularization problem, evolutionary clustering approaches such as the genetic algorithm have been used to solve this problem. These methods do not make much use of the information and knowledge available in the artifact dependency graph which is extracted from the source code. Objective: To overcome the limitations of the existing modularization techniques, this paper presents a new modularization technique named GMA (Graph-based Modularization Algorithm). Methods: In this paper, a new graph-based clustering algorithm is presented for software modularization. To this end, the depth of relationships is used to compute the similarity between artifacts, as well as seven new criteria are proposed to evaluate the quality of a modularization. The similarity presented in this paper enables the algorithm to use graph-theoretic information. Results: To demonstrate the applicability of the proposed algorithm, ten folders of Mozilla Firefox with different domains and functions, along with four other applications, are selected. The experimental results demonstrate that the proposed algorithm produces modularization closer to the human expert’s decomposition (i.e., directory structure) than the other existing algorithms. Conclusion: The proposed algorithm is expected to help a software designer in the software reverse engineering process to extract easy-to-manage and understandable modules from source code.

[1]  Shahriar Lotfi,et al.  Multi-objective search-based software modularization: structural and non-structural features , 2018, Soft Computing.

[2]  Habib Izadkhah,et al.  Information Theoretic Objective Function for Genetic Software Clustering , 2019, Proceedings.

[3]  Jingpeng Li,et al.  Improved binary similarity measures for software modularization , 2017, Frontiers of Information Technology & Electronic Engineering.

[4]  S. Mansoor Sarwar,et al.  Software clustering techniques and the use of combined algorithm , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[5]  Paramvir Singh,et al.  Modularizing Software Systems using PSO optimized hierarchical clustering , 2016, 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT).

[6]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[7]  Ali Safari Mamaghani,et al.  Clustering of Software Systems Using New Hybrid Algorithms , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[8]  Jitender Kumar Chhabra,et al.  Software Remodularization by Estimating Structural and Conceptual Relations Among Classes and Using Hierarchical Clustering , 2017 .

[9]  Shin Yoo,et al.  Search-Based Approaches for Software Module Clustering Based on Multiple Relationship Factors , 2017, Int. J. Softw. Eng. Knowl. Eng..

[10]  Habib Izadkhah,et al.  A new algorithm for software clustering considering the knowledge of dependency between artifacts in the source code , 2019, Inf. Softw. Technol..

[11]  Ayaz Isazadeh,et al.  E-CDGM: An Evolutionary Call-Dependency Graph Modularization Approach for Software Systems , 2016 .

[12]  Sebastián Ventura,et al.  Interactive multi-objective evolutionary optimization of software architectures , 2018, Inf. Sci..

[13]  Nenad Medvidovic,et al.  Obtaining ground-truth software architectures , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[14]  Jitender Kumar Chhabra,et al.  Harmony search based remodularization for object-oriented software systems , 2017, Comput. Lang. Syst. Struct..

[15]  Ali Shokoufandeh,et al.  Applying spectral methods to software clustering , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[16]  Derek Rayside,et al.  Comparing Software Architecture Recovery Techniques Using Accurate Dependencies , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[17]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[18]  Bixin Li,et al.  Directory-Based Dependency Processing for Software Architecture Recovery , 2018, IEEE Access.

[19]  Ayaz Isazadeh,et al.  Semantic-based software clustering using hill climbing , 2017, 2017 International Symposium on Computer Science and Software Engineering Conference (CSSE).

[20]  Hasan Sözer Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery , 2019, ECSA.

[21]  Jitender Kumar Chhabra,et al.  Many-objective artificial bee colony algorithm for large-scale software module clustering problem , 2018, Soft Comput..

[22]  Stéphane Ducasse,et al.  Software Architecture Reconstruction: A Process-Oriented Taxonomy , 2009, IEEE Transactions on Software Engineering.

[23]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[24]  Siraj Muhammad,et al.  Cooperative clustering for software modularization , 2013, J. Syst. Softw..

[25]  Ronald L. Rivest,et al.  Algorithmen - Eine Einführung , 2017 .

[26]  Qifeng Zhang,et al.  Reconstructing Software High-Level Architecture by Clustering Weighted Directed Class Graph , 2015, Int. J. Softw. Eng. Knowl. Eng..

[27]  Mustafa Mat Deris,et al.  Euclidean space based hierarchical clusterers combinations: an application to software clustering , 2019, Cluster Computing.

[28]  Muhammad Younus Javed,et al.  A novel approach for software architecture recovery using particle swarm optimization , 2015, Int. Arab J. Inf. Technol..

[29]  Shahriar Lotfi,et al.  Software Systems Clustering Using Estimation of Distribution Approach , 2016 .

[30]  A. Charan Kumari,et al.  Hyper-heuristic approach for multi-objective software module clustering , 2016, J. Syst. Softw..

[31]  Jing Liu,et al.  A similarity-based modularization quality measure for software module clustering problems , 2016, Inf. Sci..

[32]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[33]  Spiros Mancoridis,et al.  On the evaluation of the Bunch search-based software modularization algorithm , 2007, Soft Comput..

[34]  Periklis Andritsos,et al.  Information-theoretic software clustering , 2005, IEEE Transactions on Software Engineering.

[35]  Ali Safari Mamaghani,et al.  Software modularization using the modified firefly algorithm , 2014, 2014 8th. Malaysian Software Engineering Conference (MySEC).

[36]  Fabian Beck,et al.  On the impact of software evolution on software clustering , 2012, Empirical Software Engineering.

[37]  Xin Yao,et al.  A multi-agent evolutionary algorithm for software module clustering problems , 2016, Soft Computing.

[38]  Jitender Kumar Chhabra,et al.  An Efficient Scheme for Candidate Solutions of Search-Based Multi-objective Software Remodularization , 2016, HCI.

[39]  Vassilios Tzerpos,et al.  An effectiveness measure for software clustering algorithms , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[40]  Renu Dhir,et al.  Software Module Clustering Using Hybrid Socio- Evolutionary Algorithms , 2016 .

[41]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[42]  Derek Rayside,et al.  Measuring the Impact of Code Dependencies on Software Architecture Recovery Techniques , 2018, IEEE Transactions on Software Engineering.

[43]  Saeed Parsa,et al.  A New Encoding Scheme and a Framework to Investigate Genetic Clustering Algorithms , 2005, J. Res. Pract. Inf. Technol..

[44]  Ayaz Isazadeh,et al.  Multi-programming language software systems modularization , 2019, Comput. Electr. Eng..

[45]  Brian S. Mitchell,et al.  A heuristic approach to solving the software clustering problem , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[46]  Kiarash Mahdavi,et al.  A clustering genetic algorithm for software modularisation with a multiple hill climbing approach , 2005 .

[47]  Márcio de Oliveira Barros,et al.  Large Neighborhood Search applied to the Software Module Clustering problem , 2018, Comput. Oper. Res..