Supervised software modularisation

This paper is concerned with the challenge of reorganising a software system into modules that both obey sound design principles and are sensible to domain experts. The problem has given rise to several unsupervised automated approaches that use techniques such as clustering and Formal Concept Analysis. Although results are often partially correct, they usually require refinement to enable the developer to integrate domain knowledge. This paper presents the SUMO algorithm, an approach that is complementary to existing techniques and enables the maintainer to refine their results. The algorithm is guaranteed to eventually yield a result that is satisfactory to the maintainer, and the evaluation on a diverse range of systems shows that this occurs with a reasonably low amount of effort.

[1]  Richard C. Holt,et al.  Comparison of clustering algorithms in the context of software evolution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[2]  Arie van Deursen,et al.  Identifying objects using cluster and concept analysis , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[3]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[4]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[5]  Xin Yao,et al.  Software Module Clustering as a Multi-Objective Search Problem , 2011, IEEE Transactions on Software Engineering.

[6]  Nicolas Anquetil,et al.  Recovering software architecture from the names of source files , 1999, J. Softw. Maintenance Res. Pract..

[7]  Mark Harman,et al.  A multiple hill climbing approach to software module clustering , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[8]  Nicolas Anquetil,et al.  Experiments with clustering as a software remodularization method , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[9]  Rudi Lutz,et al.  Recovering High-Level Structure of Software Systems Using a Minimum Description Length Principle , 2002, AICS.

[10]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[11]  Mark Shtern,et al.  Evaluating software clustering using multiple simulated authoritative decompositions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[12]  Arie van Deursen,et al.  Splitting a large software repository for easing future software evolution - an industrial experience report , 2009, J. Softw. Maintenance Res. Pract..

[13]  Barry O'Sullivan,et al.  Query-Driven Constraint Acquisition , 2007, IJCAI.

[14]  Paolo Tonella,et al.  Concept Analysis for Module Restructuring , 2001, IEEE Trans. Software Eng..

[15]  Giuliano Antoniol,et al.  A language-independent software renovation framework , 2005, J. Syst. Softw..

[16]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[17]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[18]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[19]  Periklis Andritsos,et al.  Information-theoretic software clustering , 2005, IEEE Transactions on Software Engineering.

[20]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[21]  Giuliano Antoniol,et al.  Moving to smaller libraries via clustering and genetic algorithms , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[22]  Fabian Beck,et al.  Computer-Aided Extraction of Software Components , 2010, 2010 17th Working Conference on Reverse Engineering.