Software Architecture Recovery through Similarity-Based Graph Clustering

Software architecture recovery is to gain the architectural level understanding of a software system while its architecture description does not exist. In recent years, researchers have adopted various software clustering techniques to detect hierarchical structure of software systems. Most graph clustering techniques focus on the connectivity between program elements, but unreasonably ignore the similarity which is also a key measure for finding elements of one module. In this paper we propose a novel hierarchy graph clustering algorithm DGHC, which considers both similarity and connectivity between program elements. During the transformation of program dependence graph edges representing similarity between elements are added. Then similar elements are grouped by density-based approaches. The alternative strategy is adopted to find groups of closely connected and similar elements. Meanwhile we adjust the contribution of connectivity and similarity by a flexible clustering algorithm based on short random walk model, which can obtain more structure information of software to find its multiple layers. Furthermore a new method called Multi-layer Propagation Gap is proposed to suggest stable layers of hierarchy clustering result as multiple layers of software system. Extensive experimental results illustrate the effectiveness and efficiency of DGHC in detecting hierarchy structure of software through comparison with various software clustering methods.

[1]  Carlo Gabriel Porto Bellini,et al.  Measurement in Software Engineering: from the Roadmap to the Crossroads , 2008, Int. J. Softw. Eng. Knowl. Eng..

[2]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[3]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[4]  Fernando Brito e Abreu,et al.  A coupling-guided cluster analysis approach to reengineer the modularity of object-oriented systems , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[5]  Mark Shtern,et al.  Clustering Methodologies for Software Engineering , 2012, Adv. Softw. Eng..

[6]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[7]  Stéphane Ducasse,et al.  Software Architecture Reconstruction: A Process-Oriented Taxonomy , 2009, IEEE Transactions on Software Engineering.

[8]  Rainer Koschke,et al.  Atomic architectural component recovery for program understanding and evolution , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[9]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[10]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[11]  Ali Shokoufandeh,et al.  Spectral and meta-heuristic algorithms for software clustering , 2005, J. Syst. Softw..

[12]  Marina Meila,et al.  Clustering by weighted cuts in directed graphs , 2007, SDM.

[13]  Youngdo Kim,et al.  Community Identification in Directed Networks , 2009, Complex.

[14]  Nicolas Anquetil,et al.  Extracting concepts from file names; a new file clustering criterion , 1998, Proceedings of the 20th International Conference on Software Engineering.

[15]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[16]  John Davey,et al.  Evaluating the suitability of data clustering for software remodularisation , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[17]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[18]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[19]  Guy Melançon,et al.  Software components capture using graph clustering , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[20]  Spiros Mancoridis,et al.  Comparing the decompositions produced by software clustering algorithms using similarity measurements , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[21]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[23]  Sunju Park,et al.  A link-based similarity measure for scientific literature , 2010, WWW '10.

[24]  Minsu Cho,et al.  Authority-shift clustering: Hierarchical clustering by authority seeking on graphs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Dalton Serey Guerrero,et al.  Comparison of Graph Clustering Algorithms for Recovering Software Architecture Module Views , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[26]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[27]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[28]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[29]  ABRIELA,et al.  Hierarchical Clustering for Software Systems Restructuring , 2007 .

[30]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[31]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Timothy W. Finin,et al.  Detecting Commmunities via Simultaneous Clustering of Graphs and Folksonomies , 2008, WebKDD 2008.

[33]  Xiaogang Wang,et al.  Multiple layer clustering of large software systems , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[34]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Dale Schuurmans,et al.  Web Communities Identification from Random Walks , 2006, PKDD.