On a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification

Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class classification. This pairwise clustering framework learns an unsupervised nonparametric classifier from each data partition, and search for the optimal partition of the data by minimizing the generalization error of the learned classifiers associated with the data partitions. We consider two nonparametric classifiers in this framework, i.e. the nearest neighbor classifier and the plug-in classifier. Modeling the underlying data distribution by nonparametric kernel density estimation, the generalization error bounds for both unsupervised nonparametric classifiers are the sum of nonparametric pairwise similarity terms between the data points for the purpose of clustering. Under uniform distribution, the nonparametric similarity terms induced by both unsupervised classifiers exhibit a well known form of kernel similarity. We also prove that the generalization error bound for the unsupervised plug-in classifier is asymptotically equal to the weighted volume of cluster boundary [1] for Low Density Separation, a widely used criteria for semi-supervised learning and clustering. Based on the derived nonparametric pairwise similarity using the plug-in classifier, we propose a new nonparametric exemplar-based clustering method with enhanced discriminative capability, whose superiority is evidenced by the experimental results.

[1]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[2]  Shachar Lovett,et al.  Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem , 2012, COLT.

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[5]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[6]  Tomer Hertz,et al.  Pairwise Clustering and Graphical Models , 2003, NIPS.

[7]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[8]  Robert Jenssen,et al.  The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space , 2004, NIPS.

[9]  E. Giné,et al.  Rates of strong uniform consistency for multivariate kernel density estimators , 2002 .

[10]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[11]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[12]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[13]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[14]  Zenglin Xu,et al.  Adaptive Regularization for Transductive Support Vector Machine , 2009, NIPS.

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  Masashi Sugiyama,et al.  Information-maximization clustering: analytic solution and model selection (情報論的学習理論と機械学習) , 2011 .

[17]  Uwe Einmahl,et al.  Uniform in bandwidth consistency of kernel-type function estimators , 2005 .

[18]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[19]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[20]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[21]  Polina Golland,et al.  Convex Clustering with Exemplar-Based Models , 2007, NIPS.

[22]  Mikhail Belkin,et al.  On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts , 2006, NIPS.

[23]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[24]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[25]  D. Pollard,et al.  $U$-Processes: Rates of Convergence , 1987 .