Excisive Hierarchical Clustering Methods for Network Data

We introduce two practical properties of hierarchical clustering methods for (possibly asymmetric) network data: excisiveness and linear scale preservation. The latter enforces imperviousness to change in units of measure whereas the former ensures local consistency of the clustering outcome. Algorithmically, excisiveness implies that we can reduce computational complexity by only clustering a data subset of interest while theoretically guaranteeing that the same hierarchical outcome would be observed when clustering the whole dataset. Moreover, we introduce the concept of representability, i.e. a generative model for describing clustering methods through the specification of their action on a collection of networks. We further show that, within a rich set of admissible methods, requiring representability is equivalent to requiring both excisiveness and linear scale preservation. Leveraging this equivalence, we show that all excisive and linear scale preserving methods can be factored into two steps: a transformation of the weights in the input network followed by the application of a canonical clustering method. Furthermore, their factorization can be used to show stability of excisive and linear scale preserving methods in the sense that a bounded perturbation in the input network entails a bounded perturbation in the clustering output.

[1]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[2]  Declan Walsh,et al.  Symptom clustering in advanced cancer , 2006, Supportive Care in Cancer.

[3]  Facundo Mémoli,et al.  Characterization, Stability and Convergence of Hierarchical Clustering Methods , 2010, J. Mach. Learn. Res..

[4]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[5]  Arcwise Isometries,et al.  A Course in Metric Geometry , 2001 .

[6]  Shai Ben-David,et al.  A Sober Look at Clustering Stability , 2006, COLT.

[7]  Marina Meila,et al.  Spectral Clustering of Biological Sequence Data , 2005, AAAI.

[8]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[9]  Paul B. Slater Hierarchical Internal Migration Regions of France , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Marina Meila,et al.  Clustering by weighted cuts in directed graphs , 2007, SDM.

[11]  Santiago Segarra,et al.  Hierarchical Quasi-Clustering Methods for Asymmetric Networks , 2014, ICML.

[12]  Boyd Jp Asymmetric clusters of internal migration regions of France , 1980 .

[13]  Facundo Mémoli,et al.  Classifying Clustering Schemes , 2010, Foundations of Computational Mathematics.

[14]  Santiago Segarra,et al.  Axiomatic construction of hierarchical clustering in asymmetric networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  U. V. Luxburg,et al.  Towards a Statistical Theory of Clustering , 2005 .

[16]  Fionn Murtagh,et al.  Multidimensional clustering algorithms , 1985 .

[17]  P B Slater,et al.  A Partial Hierarchical Regionalization of 3140 US Counties on the Basis of 1965–1970 Intercounty Migration , 1984, Environment & planning A.

[18]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[19]  Santiago Segarra,et al.  Hierarchical clustering methods and algorithms for asymmetric networks , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[20]  David J. Marchette Data Analysis of Asymmetric Structures: Advanced Approaches in Computational Statistics , 2006, Technometrics.

[21]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  L. Hubert Min and max hierarchical clustering using asymmetric similarity measures , 1973 .

[23]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[24]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[25]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[26]  Isabelle Guyon,et al.  Clustering: Science or Art? , 2009, ICML Unsupervised and Transfer Learning.

[27]  Reza Bosagh Zadeh,et al.  A Uniqueness Theorem for Clustering , 2009, UAI.

[28]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[31]  Elena Marchiori,et al.  Axioms for graph clustering quality functions , 2013, J. Mach. Learn. Res..

[32]  Robert E. Tarjan,et al.  Efficient algorithms for finding minimum spanning trees in undirected and directed graphs , 1986, Comb..

[33]  Santiago Segarra,et al.  Alternative axiomatic constructions for hierarchical clustering of asymmetric networks , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[34]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[35]  Robert E. Tarjan An Improved Algorithm for Hierarchical Clustering Using Strong Components , 1983, Inf. Process. Lett..

[36]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.