Solution stability in linear programming relaxations: graph partitioning and unsupervised learning

We propose a new method to quantify the solution stability of a large class of combinatorial optimization problems arising in machine learning. As practical example we apply the method to correlation clustering, clustering aggregation, modularity clustering, and relative performance significance clustering. Our method is extensively motivated by the idea of linear programming relaxations. We prove that when a relaxation is used to solve the original clustering problem, then the solution stability calculated by our method is conservative, that is, it never overestimates the solution stability of the true, unrelaxed problem. We also demonstrate how our method can be used to compute the entire path of optimal solutions as the optimization problem is increasingly perturbed. Experimentally, our method is shown to perform well on a number of benchmark problems.

[1]  Thorsten Joachims,et al.  Error bounds for correlation clustering , 2005, ICML.

[2]  B. Jansen,et al.  Sensitivity analysis in linear programming: just be careful! , 1997 .

[3]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Costas S. Iliopoulos,et al.  A New Efficient Algorithm for Computing the Longest Common Subsequence , 2007, AAIM.

[5]  Martin Grötschel,et al.  Facets of the clique partitioning polytope , 1990, Math. Program..

[6]  Martin W. P. Savelsbergh,et al.  Approximating the stability region for binary mixed-integer programs , 2009, Oper. Res. Lett..

[7]  G. Nemhauser,et al.  Integer Programming , 2020 .

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[11]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[12]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[13]  Martin Grötschel,et al.  Clique-Web Facets for Multicut Polytopes , 1992, Math. Oper. Res..

[14]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[15]  Amos Fiat,et al.  Correlation Clustering - Minimizing Disagreements on Arbitrary Weighted Graphs , 2003, ESA.

[16]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[18]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[19]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[20]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[21]  Dorothea Wagner,et al.  Significance-Driven Graph Clustering , 2007, AAIM.

[22]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  M. R. Rao,et al.  The partition problem , 1993, Math. Program..