Kernel k-Groups via Hartigan’s Method

Energy statistics was proposed by Sz\' ekely in the 80's inspired by Newton's gravitational potential in classical mechanics and it provides a model-free hypothesis test for equality of distributions. In its original form, energy statistics was formulated in Euclidean spaces. More recently, it was generalized to metric spaces of negative type. In this paper, we consider a formulation for the clustering problem using a weighted version of energy statistics in spaces of negative type. We show that this approach leads to a quadratically constrained quadratic program in the associated kernel space, establishing connections with graph partitioning problems and kernel methods in machine learning. To find local solutions of such an optimization problem, we propose kernel k-groups, which is an extension of Hartigan's method to kernel spaces. Kernel k-groups is cheaper than spectral clustering and has the same computational cost as kernel k-means (which is based on Lloyd's heuristic) but our numerical results show an improved performance, especially in higher dimensions. Moreover, we verify the efficiency of kernel k-groups in community detection in sparse stochastic block models which has fascinating applications in several areas of science.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[3]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[4]  Florent Krzakala,et al.  Spectral Clustering of graphs with the Bethe Hessian , 2014, NIPS.

[5]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[6]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  M. L. Rizzo,et al.  K-groups: A Generalization of K-means Clustering , 2017, 1711.04359.

[8]  Emmanuel Abbe,et al.  Proof of the Achievability Conjectures for the General Stochastic Block Model , 2018 .

[9]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[10]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[11]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[12]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[13]  Kenji Fukumizu,et al.  Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation , 2009, NIPS.

[14]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[15]  Feng Li,et al.  The complete connectome of a learning and memory centre in an insect brain , 2017, Nature.

[16]  Cristopher Moore,et al.  Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Adel Javanmard,et al.  Phase transitions in semidefinite relaxations , 2015, Proceedings of the National Academy of Sciences.

[18]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[19]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Cencheng Shen,et al.  The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing , 2018, ArXiv.

[22]  D. Lusseau,et al.  The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations , 2003, Behavioral Ecology and Sociobiology.

[23]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[24]  Maria L. Rizzo,et al.  DISCO analysis: A nonparametric extension of analysis of variance , 2010, 1011.2288.

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  Laurent Massoulié,et al.  Community detection thresholds and the weak Ramanujan property , 2013, STOC.

[27]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[28]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[29]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[30]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[31]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[32]  Matus Telgarsky,et al.  Hartigan's Method: k-means Clustering without Voronoi , 2010, AISTATS.

[33]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[34]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[35]  Michael I. Jordan,et al.  Dimensionality Reduction for Spectral Clustering , 2011, AISTATS.

[36]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[37]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[38]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[39]  H. Altay Güvenir,et al.  Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.

[40]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[41]  Elchanan Mossel,et al.  A Proof of the Block Model Threshold Conjecture , 2013, Combinatorica.

[42]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[43]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[44]  Gábor J. Székely,et al.  The Energy of Data , 2017 .

[45]  Koby Crammer,et al.  Hartigan's K-Means Versus Lloyd's K-Means - Is It Time for a Change? , 2013, IJCAI.

[46]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[47]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.