A More Relaxed Model for Graph-Based Data Clustering: s-Plex Cluster Editing

We introduce the $s$-Plex Cluster Editing problem as a generalization of the well-studied Cluster Editing problem; both are NP-hard and both are motivated by graph-based data clustering. Instead of transforming a given graph by a minimum number of edge modifications into a disjoint union of cliques (this is Cluster Editing), the task in the case of $s$-Plex Cluster Editing is to transform a graph into a cluster graph consisting of a disjoint union of so-called $s$-plexes. Herein, an $s$-plex is a vertex set $S$ inducing a subgraph in which every vertex has degree at least $|S|-s$. Cliques are 1-plexes. The advantage of $s$-plexes for $s\geq2$ is that they allow us to model a more relaxed cluster notion ($s$-plexes instead of cliques), better reflecting inaccuracies of the input data. We develop a provably effective preprocessing based on data reduction (yielding a so-called problem kernel), a forbidden subgraph characterization of $s$-plex cluster graphs, and a depth-bounded search tree which is used to find optimal edge modification sets. Altogether, this yields efficient algorithms in case of moderate numbers of edge modifications; this is often a reasonable assumption under a maximum parsimony model for data clustering.

[1]  Christian Komusiewicz,et al.  Alternative Parameterizations for Cluster Editing , 2011, SOFSEM.

[2]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[3]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[4]  Jianer Chen,et al.  A 2k kernel for the cluster editing problem , 2012, J. Comput. Syst. Sci..

[5]  Pinar Heggernes,et al.  Generalized Graph Clustering: Recognizing (p, q)-Cluster Graphs , 2010, WG.

[6]  Michael R. Fellows,et al.  Efficient Parameterized Preprocessing for Cluster Editing , 2007, FCT.

[7]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[8]  Rolf Niedermeier,et al.  A general method to speed up fixed-parameter-tractable algorithms , 2000, Inf. Process. Lett..

[9]  Jayme Luiz Szwarcfiter,et al.  Applying Modular Decomposition to Parameterized Cluster Editing Problems , 2008, Theory of Computing Systems.

[10]  David P. Williamson,et al.  Deterministic pivoting algorithms for constrained ranking and clustering problems , 2007, SODA '07.

[11]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[12]  René van Bevern,et al.  Kernelization Through Tidying A Case Study Based on s-Plex Cluster Vertex Deletion , 2009 .

[13]  Robert W. Williams,et al.  Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function , 2005, Nature Genetics.

[14]  Rolf Niedermeier,et al.  Algorithms and Experiments for Clique Relaxations-Finding Maximum s-Plexes , 2009, SEA.

[15]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[16]  Bin Wu,et al.  A Parallel Algorithm for Enumerating All the Maximal k -Plexes , 2007, PAKDD Workshops.

[17]  Rolf Niedermeier,et al.  Invitation to data reduction and problem kernelization , 2007, SIGA.

[18]  Yehoshua Sagiv,et al.  Generating all maximal induced subgraphs for hereditary and connected-hereditary graph properties , 2008, J. Comput. Syst. Sci..

[19]  Christian Komusiewicz,et al.  Graph-Based Data Clustering with Overlaps , 2009, COCOON.

[20]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[21]  Sebastian Böcker,et al.  Exact Algorithms for Cluster Editing: Evaluation and Experiments , 2008, WEA.

[22]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[23]  Sebastian Böcker,et al.  Going weighted: Parameterized algorithms for cluster editing , 2009, Theor. Comput. Sci..

[24]  David L. Hicks,et al.  Notice of Violation of IEEE Publication PrinciplesDetecting Critical Regions in Covert Networks: A Case Study of 9/11 Terrorists Network , 2007, The Second International Conference on Availability, Reliability and Security (ARES'07).

[25]  Rolf Niedermeier,et al.  Invitation to Fixed-Parameter Algorithms , 2006 .

[26]  Roded Sharan,et al.  Cluster Graph Modification Problems , 2002, WG.

[27]  Yun Zhang,et al.  The Cluster Editing Problem: Implementations and Experiments , 2006, IWPEC.

[28]  Sven Rahmann,et al.  Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing , 2007, BMC Bioinformatics.

[29]  Mirko Krivánek,et al.  NP-hard problems in hierarchical-tree clustering , 1986, Acta Informatica.

[30]  Christian Komusiewicz,et al.  Editing Graphs into Disjoint Unions of Dense Clusters , 2009, ISAAC.

[31]  Sven Rahmann,et al.  Exact and heuristic algorithms for weighted cluster editing. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[32]  Hans L. Bodlaender,et al.  Kernelization: New Upper and Lower Bound Techniques , 2009, IWPEC.

[33]  Sergiy Butenko,et al.  Clique Relaxations in Social Network Analysis: The Maximum k-Plex Problem , 2011, Oper. Res..

[34]  B. Lewis,et al.  Transmission network analysis in tuberculosis contact investigations. , 2007, The Journal of infectious diseases.

[35]  Rolf Niedermeier,et al.  Graph-Modeled Data Clustering: Exact Algorithms for Clique Generation , 2005, Theory of Computing Systems.

[36]  Jiong Guo A more effective linear kernelization for cluster editing , 2009, Theor. Comput. Sci..