Software botryology. Automatic clustering of software systems

It has long been recognized that the decomposition of a large software system into "meaningful" subsystems is essential for both the development and maintenance phases of a software project. We introduce the term "software botryology" for the area of research that attempts to automatically cluster a software system ("botrys" is the ancient Greek word for a cluster of grapes). In this paper, we survey approaches to the clustering problem from researchers in the software engineering community. We also present clustering techniques used in other disciplines and argue that their utilization in a software context could lead to better solutions to the software clustering problem. Finally, we outline research challenges and open problems of interest.

[1]  Richard C. Holt,et al.  Design maintenance: unexpected architectural interactions (experience report) , 1995, Proceedings of International Conference on Software Maintenance.

[2]  Robert W. Schwanke,et al.  An intelligent tool for re-engineering software modularity , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[3]  R. W. Schwanke,et al.  Discovering, visualizing, and controlling software structure , 1989, IWSSD '89.

[4]  Alexander S. Yeh,et al.  Reverse Engineering to the Architectural Level , 1995, 1995 17th International Conference on Software Engineering.

[5]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[6]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[7]  Nicolas Anquetil,et al.  File clustering using naming conventions for legacy systems , 1997, CASCON.

[8]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[9]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[10]  Mary Shaw,et al.  An Introduction to Software Architecture , 1993, Advances in Software Engineering and Knowledge Engineering.

[11]  Hausi A. Müller,et al.  Composing subsystem structures using (k,2)-partite graphs , 1990, Proceedings. Conference on Software Maintenance 1990.

[12]  Richard C. Holt,et al.  A hybrid process for recovering software architecture , 1996, CASCON.

[13]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[14]  James M. Neighbors Finding reusable software components in large systems , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[15]  Richard C. Holt,et al.  The Orphan Adoption problem in architecture maintenance , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[16]  Robert W. Schwanke,et al.  Cross references are features , 1989, SCM.

[17]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  Laszlo A. Belady,et al.  System partitioning and its measure , 1981, J. Syst. Softw..

[20]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[21]  B. S. Everitt,et al.  Cluster analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[22]  Mehmet A Orgun,et al.  A Reverse Engineering Approach to Subsystem Structure Identiication a Reverse Engineering Approach to Subsystem Structure Identiication , 1993 .

[23]  Song C. Choi,et al.  Extracting and restructuring the design of large systems , 1990, IEEE Software.