Combining Machine Learning and Information Retrieval Techniques for Software Clustering

In the field of Software Maintenance the definition of effective approaches to partition a software system into meaningful subsystems is a longstanding and relevant research topic. These techniques are very important as they can significantly support a Maintainer in his/her tasks by grouping related entities of a large system into smaller and easier to comprehend subsystems.

[1]  Arun Lakhotia,et al.  Toward experimental evaluation of subsystem classification recovery techniques , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[2]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[3]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[6]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[7]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[8]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[9]  Richard C. Holt,et al.  On the stability of software clustering algorithms , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[10]  Xiaogang Wang,et al.  Clustering large software systems at multiple layers , 2007, Inf. Softw. Technol..

[11]  Vassilios Tzerpos,et al.  An effectiveness measure for software clustering algorithms , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[12]  Giuseppe Scanniello,et al.  Investigating the use of lexical information for software system clustering , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[13]  Periklis Andritsos,et al.  Information-theoretic software clustering , 2005, IEEE Transactions on Software Engineering.

[14]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[15]  Oscar Nierstrasz,et al.  The story of moose: an agile reengineering environment , 2005, ESEC/FSE-13.

[16]  Giuseppe Scanniello,et al.  Architecture Recovery Using Latent Semantic Indexing and K-Means: An Empirical Evaluation , 2010, 2010 8th IEEE International Conference on Software Engineering and Formal Methods.

[17]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[18]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Giuseppe Scanniello,et al.  Identifying similar pages in Web applications using a competitive clustering algorithm , 2007, J. Softw. Maintenance Res. Pract..

[21]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[22]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[23]  Giuseppe Scanniello,et al.  Identifying similar pages in Web applications using a competitive clustering algorithm: Special Issue Articles , 2007 .

[24]  Yann-Gaël Guéhéneuc,et al.  Recognizing Words from Source Code Identifiers Using Speech Recognition Techniques , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[25]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[26]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[27]  Richard C. Holt,et al.  Comparison of clustering algorithms in the context of software evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[28]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[29]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[30]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[31]  Giuseppe Scanniello,et al.  Using the Kleinberg Algorithm and Vector Space Model for Software System Clustering , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[32]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[33]  Vassilios Tzerpos,et al.  An optimal algorithm for MoJo distance , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[34]  Arie van Deursen,et al.  Symphony: view-driven software architecture reconstruction , 2004, Proceedings. Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004).

[35]  Dalton Serey Guerrero,et al.  Comparison of Graph Clustering Algorithms for Recovering Software Architecture Module Views , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[36]  Giuseppe Scanniello,et al.  A Probabilistic Based Approach towards Software System Clustering , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[37]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.