Column generation approaches for the software clustering problem

This work presents the application of branch-and-price approaches to the automatic version of the Software Clustering Problem. To tackle this problem, we apply the Dantzig–Wolfe decomposition to a formulation from the literature. Given this, we present two Column Generation (CG) approaches to solve the linear programming relaxation of the resulting reformulation: the standard CG approach, and a new approach, which we call Staged Column Generation (SCG). Also, we propose a modification to the pricing subproblem that allows to add multiple columns at each iteration of the CG. We test our algorithms in a set of 45 instances from the literature. The proposed approaches were able to improve the literature results solving all these instances to optimality. Furthermore, the SCG approach presented a considerable performance improvement regarding computational time, number of iterations and generated columns when compared with the standard CG as the size of the instances grows.

[1]  Ali Safari Mamaghani,et al.  Clustering of Software Systems Using New Hybrid Algorithms , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[2]  Mark Harman,et al.  A multiple hill climbing approach to software module clustering , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[3]  Spiros Mancoridis,et al.  Using Heuristic Search Techniques To Extract Design Abstractions From Source Code , 2002, GECCO.

[4]  Brian S. Mitchell,et al.  A heuristic approach to solving the software clustering problem , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[5]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[6]  Mark Harman,et al.  A New Representation And Crossover Operator For Search-based Optimization Of Software Modularization , 2002, GECCO.

[7]  S. Lotfi,et al.  An Evolutionary Approach for Partitioning Weighted Module Dependency Graphs , 2007, 2007 Innovations in Information Technologies (IIT).

[8]  Saeed Parsa,et al.  A New Encoding Scheme and a Framework to Investigate Genetic Clustering Algorithms , 2005, J. Res. Pract. Inf. Technol..

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[11]  Dorit S. Hochbaum,et al.  A Polynomial Time Algorithm for Rayleigh Ratio on Discrete Variables: Replacing Spectral Techniques for Expander Ratio, Normalized Cut, and Cheeger Constant , 2013, Oper. Res..

[12]  Richard L. Gauthier,et al.  Designing systems programs , 1970 .

[13]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[14]  Spiros Mancoridis,et al.  Automatic clustering of software systems using a genetic algorithm , 1999, STEP '99. Proceedings Ninth International Workshop Software Technology and Engineering Practice.

[15]  Renu Dhir,et al.  Software Architecture Recovery using Genetic Black Hole Algorithm , 2015, SOEN.

[16]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[17]  Marcia Fampa,et al.  Mixed-Integer Linear Programming Formulations for the Software Clustering Problem , 2013, Comput. Optim. Appl..

[18]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[19]  François Vanderbeck,et al.  Branching in branch-and-price: a generic scheme , 2011, Math. Program..

[20]  Ali Asghar Pourhaji Kazem,et al.  A modified genetic algorithm for software clustering problem , 2006 .

[21]  Alain Billionnet,et al.  Solution of a fractional combinatorial optimization problem by mixed integer programming , 2006, RAIRO Oper. Res..

[22]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[23]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[24]  Mark Harman,et al.  Finding Building Blocks for Software Clustering , 2003, GECCO.

[25]  A. Billionnet,et al.  Résolution d'un problème combinatoire fractionnaire par la programmation linéaire mixte , 2006 .

[26]  Dorit S. Hochbaum Polynomial Time Algorithms for Ratio Regions and a Variant of Normalized Cut , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Franz Rendl,et al.  A computational study and survey of methods for the single-row facility layout problem , 2013, Comput. Optim. Appl..

[28]  Outi Räihä,et al.  A survey on search-based software design , 2010, Comput. Sci. Rev..