Identifying composite crosscutting concerns through semi‐supervised learning

Aspect mining improves the modularity of legacy software systems through identifying their underlying crosscutting concerns (CCs). However, a realistic CC is a composite one that consists of CC seeds and relative program elements, which makes it a great challenge to identify a composite CC. In this paper, inspired by the state‐of‐the‐art information retrieval techniques, we model this problem as a semi‐supervised learning problem. First, the link analysis technique is adopted to generate CC seeds. Second, we construct a coupling graph, which indicates the relationship between CC seeds. Then, we adopt community detection technique to generate groups of CC seeds as constraints for semi‐supervised learning, which can guide the clustering process. Furthermore, we propose a semi‐supervised graph clustering approach named constrained authority‐shift clustering to identify composite CCs. Two measurements, namely, similarity and connectivity, are defined and seeded graph is generated for clustering program elements. We evaluate constrained authority‐shift clustering on numerous software systems including large‐scale distributed software system. The experimental results demonstrate that our semi‐supervised learning is more effective in detecting composite CCs. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Charles A. Micchelli,et al.  On Spectral Learning , 2010, J. Mach. Learn. Res..

[2]  Sunju Park,et al.  A link-based similarity measure for scientific literature , 2010, WWW '10.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Hans-Arno Jacobsen,et al.  Mining Crosscutting Concerns through Random Walks , 2012, IEEE Transactions on Software Engineering.

[5]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[6]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Myriam Lamolle,et al.  Matching of Enhanced XML Schemas with a Measure of Structural-context Similarity , 2007, WEBIST.

[9]  João M. Fernandes,et al.  Towards a catalog of aspect-oriented refactorings , 2005, AOSD '05.

[10]  Lili He,et al.  Aspect Mining Using Clustering and Association Rule Method , 2006 .

[11]  Timothy W. Finin,et al.  Detecting Commmunities via Simultaneous Clustering of Graphs and Folksonomies , 2008, WebKDD 2008.

[12]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[13]  Jin Huang,et al.  Software Architecture Recovery through Similarity-Based Graph Clustering , 2013, Int. J. Softw. Eng. Knowl. Eng..

[14]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[15]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[16]  T. Kiss,et al.  Commissioning and first experience of the ALICE Data Acquisition System , 2009, 2009 16th IEEE-NPSS Real Time Conference.

[17]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[18]  Sergei Vassilvitskii,et al.  How slow is the k-means method? , 2006, SCG '06.

[19]  Grigoreta Sofia Moldovan,et al.  Aspect Mining using a Vector-Space Model Based Clustering Approach , 2006 .

[20]  Maximilian Störzer,et al.  Aspect Mining for Aspect Refactoring : An Experience Report , 2006 .

[21]  Mariano Ceccato,et al.  Aspect mining through the formal concept analysis of execution traces , 2004, 11th Working Conference on Reverse Engineering.

[22]  Hans-Arno Jacobsen,et al.  Efficiently mining crosscutting concerns through random walks , 2007, AOSD.

[23]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[24]  Danfeng Zhang,et al.  Automated Aspect Recommendation through Clustering-Based Fan-in Analysis , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[25]  Marina Meila,et al.  Clustering by weighted cuts in directed graphs , 2007, SDM.

[26]  Jing Yang,et al.  Aspect Mining Using Link Analysis , 2010, 2010 Fifth International Conference on Frontier of Computer Science and Technology.

[27]  R. Mooney,et al.  Comparing and Unifying Search-Based and Similarity-Based Approaches to Semi-Supervised Clustering , 2003 .

[28]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[29]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[30]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[31]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[32]  Minsu Cho,et al.  Authority-shift clustering: Hierarchical clustering by authority seeking on graphs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[35]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[36]  Arie van Deursen,et al.  Identifying aspects using fan-in analysis , 2004, 11th Working Conference on Reverse Engineering.

[37]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[38]  Grigoreta Sofia Moldovan,et al.  A GRAPH ALGORITHM FOR IDENTIFICATION OF CROSSCUTTING CONCERNS , 2006 .

[39]  Leon Moonen,et al.  An Integrated Crosscutting Concern Migration Strategy and its Application to JHOTDRAW , 2007, Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007).

[40]  Dale Schuurmans,et al.  Web Communities Identification from Random Walks , 2006, PKDD.

[41]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[42]  Jens Krinke,et al.  Aspect mining using event traces , 2004, Proceedings. 19th International Conference on Automated Software Engineering, 2004..

[43]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.