Exact Algorithms for Cluster Editing: Evaluation and Experiments

AbstractThe Cluster Editing problem is defined as follows: Given an undirected, loopless graph, we want to find a set of edge modifications (insertions and deletions) of minimum cardinality, such that the modified graph consists of disjoint cliques.We present empirical results for this problem using exact methods from fixed-parameter algorithmics and linear programming. We investigate parameter-independent data reduction methods and find that effective preprocessing is possible if the number of edge modifications k is smaller than some multiple of  $\lvert V\rvert$ , where V is the vertex set of the input graph. In particular, combining parameter-dependent data reduction with lower and upper bounds we can effectively reduce graphs satisfying $k\leq25\lvert V\rvert$ .In addition to the fastest known fixed-parameter branching strategy for the problem, we investigate an integer linear program (ILP) formulation of the problem using a cutting plane approach. Our results indicate that both approaches are capable of solving large graphs with 1000 vertices and several thousand edge modifications. For the first time, complex and very large graphs such as biological instances allow for an exact solution, using a combination of the above techniques. (A preliminary version of this paper appeared under the title “Exact algorithms for cluster editing: Evaluation and experiments” in the Proceedings of the 7th Workshop on Experimental Algorithms, WEA 2008, in: LNCS, vol. 5038, Springer, pp. 289–302.)

[1]  Rolf Niedermeier,et al.  Graph-Modeled Data Clustering: Fixed-Parameter Algorithms for Clique Generation , 2003, CIAC.

[2]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[3]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[4]  Sven Rahmann,et al.  Exact and heuristic algorithms for weighted cluster editing. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[5]  Sebastian Böcker,et al.  Going weighted: Parameterized algorithms for cluster editing , 2009, Theor. Comput. Sci..

[6]  Mirko Krivánek,et al.  NP-hard problems in hierarchical-tree clustering , 1986, Acta Informatica.

[7]  Rolf Niedermeier,et al.  Graph-Modeled Data Clustering: Exact Algorithms for Clique Generation , 2005, Theory of Computing Systems.

[8]  Jiong Guo A more effective linear kernelization for cluster editing , 2009, Theor. Comput. Sci..

[9]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[10]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[11]  Roded Sharan,et al.  Cluster Graph Modification Problems , 2002, WG.

[12]  Yun Zhang,et al.  The Cluster Editing Problem: Implementations and Experiments , 2006, IWPEC.

[13]  David P. Williamson,et al.  Deterministic Algorithms for Rank Aggregation and Other Ranking and Clustering Problems , 2007, WAOA.

[14]  Rolf Niedermeier,et al.  Automated Generation of Search Tree Algorithms for Hard Graph Modification Problems , 2004, Algorithmica.

[15]  Steffen Becker,et al.  Quality of Software Architectures. Models and Architectures , 2008, Lecture Notes in Computer Science.

[16]  Sven Rahmann,et al.  Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing , 2007, BMC Bioinformatics.

[17]  Sebastian Böcker,et al.  A Fixed-Parameter Approach for Weighted Cluster Editing , 2007, APBC.

[18]  Rudolf Müller,et al.  On the partial order polytope of a digraph , 1996, Math. Program..

[19]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[20]  Fred W. Glover,et al.  Clustering of Microarray data via Clique Partitioning , 2005, J. Comb. Optim..

[21]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[22]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..