Using fold-in and fold-out in the architecture recovery of software systems

In this paper we present an approach to automate the architecture recovery process of software systems. The approach is built on information retrieval and clustering techniques, and, in particular, uses Latent Semantic Indexing (LSI) to get similarities among software entities (e.g., programs or classes) and the k-means clustering algorithm to form groups of software entities that implement similar functionality. In order to improve computational time in the context of the software evolution and then reduce energy waste, the architecture recovery process can be also applied by using fold-in and fold-out mechanisms that, respectively, add and remove software entities to the LSI representation of the understudy software system. The approach has been implemented in a prototype of a supporting software system as an Eclipse plug-in. Finally, to assess the approach and the plug-in, we have conducted an empirical investigation on five open source software systems implemented using the programming languages Java and C/C++. In the investigation special emphasis has been also given to the effect of using the fold-in and fold-out mechanisms.

[1]  Rainer Koschke,et al.  Atomic architectural component recovery for program understanding and evolution , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[2]  Spiros Mancoridis,et al.  Automatic clustering of software systems using a genetic algorithm , 1999, STEP '99. Proceedings Ninth International Workshop Software Technology and Engineering Practice.

[3]  Giuseppe Scanniello,et al.  An investigation of clustering algorithms in the identification of similar web pages , 2009 .

[4]  Giuseppe Scanniello,et al.  Identifying similar pages in Web applications using a competitive clustering algorithm: Special Issue Articles , 2007 .

[5]  Cacm Staff,et al.  A conversation with David E. Shaw , 2009 .

[6]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[7]  Goetz Graefe The five-minute rule 20 years later (and how flash memory changes the rules) , 2009, CACM.

[8]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[9]  Giuseppe Scanniello,et al.  Architectural layer recovery for software system understanding and evolution , 2010 .

[10]  Richard C. Holt,et al.  Comparison of clustering algorithms in the context of software evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[11]  Thomas M. Pigoski Practical Software Maintenance: Best Practices for Managing Your Software Investment , 1996 .

[12]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[13]  Hausi A. Müller,et al.  A reverse-engineering approach to subsystem structure identification , 1993, J. Softw. Maintenance Res. Pract..

[14]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[15]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[16]  Giuseppe Scanniello,et al.  Using the Kleinberg Algorithm and Vector Space Model for Software System Clustering , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[17]  Paolo Tonella,et al.  Improving Web site understanding with keyword-based clustering , 2008 .

[18]  Paolo Tonella,et al.  Concept Analysis for Module Restructuring , 2001, IEEE Trans. Software Eng..

[19]  Donna K. Harman,et al.  Ranking Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[20]  Giuseppe Scanniello,et al.  Architecture Recovery Using Latent Semantic Indexing and K-Means: An Empirical Evaluation , 2010, 2010 8th IEEE International Conference on Software Engineering and Formal Methods.

[21]  Meir M. Lehman,et al.  Program evolution , 1984, Inf. Process. Manag..

[22]  Giuseppe Scanniello,et al.  An Investigation of Clustering Algorithms in the Comprehension of Legacy Web Applications , 2009, J. Web Eng..

[23]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[25]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[26]  Dalton Serey Guerrero,et al.  Comparison of Graph Clustering Algorithms for Recovering Software Architecture Module Views , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[27]  Arie van Deursen,et al.  Symphony: view-driven software architecture reconstruction , 2004, Proceedings. Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004).

[28]  Paolo Tonella,et al.  Reverse Engineering of Object Oriented Code (Monographs in Computer Science) , 2004 .

[29]  Goetz Graefe,et al.  The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules , 2008, ACM Queue.

[30]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[31]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[32]  Giuseppe Scanniello,et al.  A Probabilistic Based Approach towards Software System Clustering , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[33]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[34]  Marvin V. Zelkowitz,et al.  Principles of software engineering and design , 1979 .

[35]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[36]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[37]  Giuseppe Scanniello,et al.  Investigating the use of lexical information for software system clustering , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[38]  P. Tonella Reverse engineering of object oriented code , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[39]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[40]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[41]  Giuseppe Scanniello,et al.  Identifying similar pages in Web applications using a competitive clustering algorithm , 2007, J. Softw. Maintenance Res. Pract..

[42]  Stéphane Ducasse,et al.  Moose: A Collaborative and Extensible Reengineering Environment , 2005, Tools for Software Maintenance and Reengineering.

[43]  Vassilios Tzerpos,et al.  An optimal algorithm for MoJo distance , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[44]  L. Guttman Some necessary conditions for common-factor analysis , 1954 .

[45]  Bernd Bruegge,et al.  Object-Oriented Software Engineering Using UML, Patterns, and Java , 2009 .

[46]  Richard C. Holt,et al.  On the stability of software clustering algorithms , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.