Pairwise Clustering by Minimizing the Error of Unsupervised Nearest Neighbor Classification

Pair wise clustering methods, including the popular graph cut based approaches such as normalized cut, partition the data space into clusters by the pair wise affinity between data points. The success of pair wise clustering largely depends on the pair wise affinity function defined over data points coming from different clusters. Interpreting the pair wise affinity in a probabilistic framework, we build the relationship between pair wise clustering and unsupervised classification by learning the soft Nearest Neighbor (NN) classifier from unlabeled data, and search for the optimal partition of the data points by minimizing the generalization error of the learned classifier associated with the data partitions. Modeling the underlying distribution of the data by non-parametric kernel density estimation, the asymptotic generalization error of the unsupervised soft NN classification involves only the pair wise affinity between data points. Moreover, such error rate reduces to the well-known kernel form of graph cut in case of uniform data distribution, which provides another understanding of the kernel similarity used in Laplacian Eigenmaps [1] which also assumes uniform distribution. By minimizing the generalization error bound, we propose a new clustering algorithm. Our algorithm efficiently partition the data by inference in a pair wise MRF model. Experimental results demonstrate the effectiveness of our method.

[1]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Masashi Sugiyama,et al.  On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution , 2011, ICML.

[3]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[4]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[5]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[6]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[7]  L. Hubert,et al.  Comparing partitions , 1985 .

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[10]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[11]  David Barber,et al.  Kernelized Infomax Clustering , 2005, NIPS.

[12]  Polina Golland,et al.  Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[13]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[14]  Thomas S. Huang,et al.  Pairwise Exemplar Clustering , 2012, AAAI.

[15]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[16]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Tomer Hertz,et al.  Pairwise Clustering and Graphical Models , 2003, NIPS.

[18]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[19]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[20]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.