Software Remodularization by Estimating Structural and Conceptual Relations Among Classes and Using Hierarchical Clustering

In this paper, we have presented a technique of software remodularization by estimating conceptual similarity among software elements (Classes). The proposed technique makes use of both structural and semantic coupling measurements together to get much more accurate coupling measures. In particular, the proposed approach makes use of lexical information extracted from six main parts of the source code of a class, namely comments, class names, attribute names, method signatures, parameter names and method source code statements zone. Simultaneously, it also makes use of counting of other class’s member functions used by a given class as a structural coupling measure among classes. Structural coupling among software elements (classes) are measured using information-flow based coupling metric (ICP) and conceptual coupling is measured by tokenizing source code and calculating Cosine Similarity. Clustering is performed by performing Hierarchical Agglomerate Clustering (HAC). The proposed technique is tested on three standard open source Java software’s. The obtained results encourage remodularization by showing higher accuracy against the corresponding software gold standard.

[1]  Houari A. Sahraoui,et al.  Automatic Package Coupling and Cycle Minimization , 2009, 2009 16th Working Conference on Reverse Engineering.

[2]  Jitender Kumar Chhabra,et al.  An approach for clustering class coupling metrics to mine object oriented software components , 2016, Int. Arab J. Inf. Technol..

[3]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[4]  H. Li,et al.  Measuring software similarity based on structure and property of class diagram , 2013, 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI).

[5]  Zoran Budimac,et al.  A language-independent approach to the extraction of dependencies between source code entities , 2014, Inf. Softw. Technol..

[6]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[7]  Arie van Deursen,et al.  Identifying objects using cluster and concept analysis , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[8]  Giuseppe Visaggio,et al.  Software salvaging and the call dominance tree , 1995, J. Syst. Softw..

[9]  Mark Harman,et al.  A New Representation And Crossover Operator For Search-based Optimization Of Software Modularization , 2002, GECCO.

[10]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[11]  Jitender Kumar Chhabra,et al.  Preserving Core Components of Object-oriented Packages while Maintaining Structural Quality , 2015 .

[12]  Giuseppe Scanniello,et al.  Weighing lexical information for software clustering in the context of architecture recovery , 2015, Empirical Software Engineering.

[13]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[14]  Paolo Tonella,et al.  Concept Analysis for Module Restructuring , 2001, IEEE Trans. Software Eng..

[15]  Giuseppe Scanniello,et al.  Architecture Recovery Using Latent Semantic Indexing and K-Means: An Empirical Evaluation , 2010, 2010 8th IEEE International Conference on Software Engineering and Formal Methods.

[16]  Ghizlane El-Boussaidi,et al.  Combining lexical and structural information to reconstruct software layers , 2016, Inf. Softw. Technol..

[17]  Abdul Azim Abd Ghani,et al.  Component-based Software System Dependency Metrics based on Component Information Flow Measurements , 2011, ICSEA 2011.

[18]  Giuseppe Scanniello,et al.  A Probabilistic Based Approach towards Software System Clustering , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[19]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[20]  Malcolm Munro,et al.  Moral Dominance Relations for Program Comprehension , 2003, IEEE Trans. Software Eng..

[21]  Matthias Biehl,et al.  Search-based improvement of subsystem decompositions , 2005, GECCO '05.

[22]  Giuseppe Scanniello,et al.  Investigating the use of lexical information for software system clustering , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[23]  Periklis Andritsos,et al.  Information-theoretic software clustering , 2005, IEEE Transactions on Software Engineering.

[24]  Gabriele Bavota,et al.  Software Re-Modularization Based on Structural and Semantic Metrics , 2010, 2010 17th Working Conference on Reverse Engineering.

[25]  Denys Poshyvanyk,et al.  Integrating conceptual and logical couplings for change impact analysis in software , 2013, Empirical Software Engineering.

[26]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[27]  Jitender Kumar Chhabra,et al.  Improving modular structure of software system using structural and lexical dependency , 2017, Inf. Softw. Technol..

[28]  Gabriele Bavota,et al.  Methodbook: Recommending Move Method Refactorings via Relational Topic Models , 2014, IEEE Transactions on Software Engineering.

[29]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[30]  Giuliano Antoniol,et al.  A method to re-organize legacy systems via concept analysis , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.