Scalable Constrained Spectral Clustering via the Randomized Projected Power Method

Constrained spectral clustering is an important area with many applications. However, most previous work has only been applied to relatively small data sets: graphs with thousands of points. This prevents this work from being applied to the large data sets found in application domains such as medical imaging and document data. Recent work on constrained and unconstrained spectral clustering has explored scalability of these methods via data approximations such as the Nystrom method which requires the selection of landmarks. However, compressing a graph may lead to undesirable results and poses the additional problem of how to chose landmarks. Instead in this paper, we propose a fast and scalable numerical algorithmic solution for the constrained clustering problem. We show the convergence and stability of our approach by proving its rate of convergence and demonstrate the effectiveness of our algorithm with empirical results on several real data sets. Our approach achieved comparable accuracy as popular constrained spectral clustering algorithms but taking several hundred times less time.

[1]  Johan A. K. Suykens,et al.  Learning from General Label Constraints , 2004, SSPR/SPR.

[2]  Marie desJardins,et al.  Constrained Spectral Clustering under a Local Proximity Structure Assumption , 2005, FLAIRS.

[3]  Ian Davidson,et al.  On constrained spectral clustering and its applications , 2012, Data Mining and Knowledge Discovery.

[4]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[5]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[6]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[7]  Ian Davidson,et al.  Improving document clustering using automated machine translation , 2012, CIKM '12.

[8]  Dale Schuurmans,et al.  Fast normalized cut with linear constraints , 2009, CVPR.

[9]  Fei Wang,et al.  Integrated KL (K-means - Laplacian) Clustering: A New Clustering Approach by Combining Attribute Data and Pairwise Relations , 2009, SDM.

[10]  Zhenguo Li,et al.  Constrained clustering via spectral regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  G. Golub,et al.  Large sparse symmetric eigenvalue problems with homogeneous linear constraints: the Lanczos process with inner–outer iterations , 2000 .

[12]  Nello Cristianini,et al.  Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problem , 2006, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  W. Gander,et al.  A constrained eigenvalue problem , 1988 .

[15]  Charles A. Micchelli,et al.  On Spectral Learning , 2010, J. Mach. Learn. Res..

[16]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[17]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  S. S. Ravi,et al.  Intractability and clustering with constraints , 2007, ICML '07.

[19]  M. Shahriar Hossain,et al.  Unifying dependent clustering and disparate clustering for non-homogeneous data , 2010, KDD.

[20]  James Saunderson,et al.  Spectral clustering with inconsistent advice , 2008, ICML '08.

[21]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[22]  Miguel Á. Carreira-Perpiñán,et al.  Constrained spectral clustering through affinity propagation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jianbo Shi,et al.  Grouping with Bias , 2001, NIPS.