A Graph-Theoretic Clustering Methodology Based on Vertex-Attack Tolerance

We consider a schema for graph-theoretic clustering of data using a node-based resilience measure called vertex attack tolerance (VAT). Resilience measures indicate worst case (critical) attack sets of edges or nodes in a network whose removal disconnects the graph into separate connected components: the resulting components form the basis for candidate clusters, and the critical sets of edges or nodes form the intercluster boundaries. Given a graph representation G of data, the vertex attack tolerance of G is τ(G) = min S⊂V |S| / |V −S−C max V −S)|+1 , where C max (V − S) is the largest component remaining in the graph upon the removal of critical node set S. We propose three principal variations of VAT-based clustering methodologies: hierarchical (hier-VAT-Clust), non-hierarchical (VAT-Clust) variations, and variation partial-VAT-Clust. The hierarchical implementation yielded the best results on both synthetic and real datasets. Partial-VAT-Clust is useful in data involving noise, as it attempts to remove the noise while clustering the actual data. We also explored possible graph representations options, such as geometric and k-nearest neighbors, and discuss it in context of clustering efficiency and accuracy.

[1]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[2]  Gunes Ercal On Vertex Attack Tolerance in Regular Graphs , 2014, ArXiv.

[3]  Yuval Rabani,et al.  On the Hardness of Approximating Multicut and Sparsest-Cut , 2005, Computational Complexity Conference.

[4]  James R. Lee,et al.  Improved approximation algorithms for minimum-weight vertex separators , 2005, STOC '05.

[5]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Andrew B. Kahng,et al.  Spectral Partitioning with Multiple Eigenvectors , 1999, Discret. Appl. Math..

[8]  Gunes Ercal,et al.  Comparative Resilience Notions and Vertex Attack Tolerance of Scale-Free Networks , 2014, ArXiv.

[9]  Prasad Raghavendra,et al.  The Complexity of Approximating Vertex Expansion , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[10]  Vasek Chvátal,et al.  Tough graphs and hamiltonian circuits , 1973, Discret. Math..

[11]  Marcin Szpyrka,et al.  A Note on Granular Sets and Their Relation to Rough Sets , 2007, RSEISP.

[12]  J. Bezdek,et al.  VAT: a tool for visual assessment of (cluster) tendency , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[13]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[14]  Gunes Ercal,et al.  Resilience Notions for Scale-free Networks , 2013, Complex Adaptive Systems.

[15]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[16]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.