Active spectral clustering via iterative uncertainty reduction

Spectral clustering is a widely used method for organizing data that only relies on pairwise similarity measurements. This makes its application to non-vectorial data straight-forward in principle, as long as all pairwise similarities are available. However, in recent years, numerous examples have emerged in which the cost of assessing similarities is substantial or prohibitive. We propose an active learning algorithm for spectral clustering that incrementally measures only those similarities that are most likely to remove uncertainty in an intermediate clustering solution. In many applications, similarities are not only costly to compute, but also noisy. We extend our algorithm to maintain running estimates of the true similarities, as well as estimates of their accuracy. Using this information, the algorithm updates only those estimates which are relatively inaccurate and whose update would most likely remove clustering uncertainty. We compare our methods on several datasets, including a realistic example where similarities are expensive and noisy. The results show a significant improvement in performance compared to the alternatives.

[1]  Ian Davidson,et al.  Active Spectral Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[2]  Petros Drineas,et al.  Pass efficient algorithms for approximating large matrices , 2003, SODA '03.

[3]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[4]  Nebojsa Jojic,et al.  Structural epitome: a way to summarize one's visual experience , 2010, NIPS.

[5]  Ohad Shamir,et al.  Spectral Clustering on a Budget , 2011, AISTATS.

[6]  Dimitris Achlioptas,et al.  Fast computation of low rank matrix approximations , 2001, STOC '01.

[7]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[8]  Ling Huang,et al.  Spectral Clustering with Perturbed Data , 2008, NIPS.

[9]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[10]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[11]  Lehel Csató,et al.  Active Learning with Clustering , 2011, Active Learning and Experimental Design @ AISTATS.

[12]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[13]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[16]  Alan M. Frieze,et al.  Fast monte-carlo algorithms for finding low-rank approximations , 2004, JACM.

[17]  A. Rahimi,et al.  Clustering with Normalized Cuts is Clustering with a Hyperplane , 2004 .

[18]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[19]  Marie desJardins,et al.  Active Constrained Clustering by Examining Spectral Eigenvectors , 2005, Discovery Science.

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[22]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[23]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[24]  Domonkos Tikk,et al.  Investigation of Various Matrix Factorization Methods for Large Recommender Systems , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[25]  Atsushi Imiya,et al.  Fast Spectral Clustering with Random Projection and Sampling , 2009, MLDM.