Landmark selection for spectral clustering based on Weighted PageRank

Abstract Spectral clustering methods have various real-world applications, such as face recognition, community detection, protein sequences clustering etc. Although spectral clustering methods can detect arbitrary shaped clusters, resulting thus in high clustering accuracy, the heavy computational cost limits their scalability. In this paper, we propose an accelerated spectral clustering method based on landmark selection. According to the Weighted PageRank algorithm, the most important nodes of the data affinity graph are selected as landmarks. Furthermore, the selected landmarks are provided to a landmark spectral clustering technique to achieve scalable and accurate clustering. In our experiments, by using two benchmark face and shape image data sets, we examine several landmark selection strategies for scalable spectral clustering that either ignore or consider the topological properties of the data in the affinity graph. Also, we show that the proposed method outperforms baseline and accelerated spectral clustering methods, in terms of computational cost and clustering accuracy, respectively. Finally, we provide future directions in spectral clustering.

[1]  Jiawei Han,et al.  Efficient Kernel Discriminant Analysis via Spectral Regression , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[4]  James A. Casbon,et al.  Spectral clustering of protein sequences , 2006, Nucleic acids research.

[5]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[6]  赵文清,et al.  基于Kernel K-means的负荷曲线聚类 , 2016 .

[7]  Johan A. K. Suykens,et al.  Kernel Spectral Clustering for Big Data Networks , 2013, Entropy.

[8]  Clark L. Maxam,et al.  Applied Non-Parametric Regression Techniques: Estimating Prepayments On Fixed Rate Mortgage-Backed Securities , 2000 .

[9]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[10]  Christos Faloutsos,et al.  Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation , 2011, PAKDD.

[11]  Masaki Aono,et al.  Multi-Fourier spectra descriptor and augmentation with spectral clustering for 3D shape retrieval , 2009, The Visual Computer.

[12]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[13]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[15]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[16]  Martine D. F. Schlag,et al.  Spectral K-Way Ratio-Cut Partitioning and Clustering , 1993, 30th ACM/IEEE Design Automation Conference.

[17]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Ken-ichi Iso Speaker clustering using vector quantization and spectral clustering , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[21]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Weifu Chen,et al.  Spectral clustering: A semi-supervised approach , 2012, Neurocomputing.

[23]  Eleni Constantinou,et al.  Scalable Spectral Clustering with Weighted PageRank , 2014, MEDI.

[24]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[25]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[26]  David A. Clausi,et al.  Enabling scalable spectral clustering for image segmentation , 2010, Pattern Recognit..

[27]  Yung-Yu Chuang,et al.  Affinity aggregation for spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yun Chi,et al.  On evolutionary spectral clustering , 2009, TKDD.

[29]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[30]  J. Kleinberg,et al.  Authoritative Soueces in a Hyper-linked Environment , 1998, SODA 1998.

[31]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[32]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Jaideep Srivastava,et al.  Incremental page rank computation on evolving graphs , 2005, WWW '05.

[35]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[36]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[37]  Ian Davidson,et al.  On constrained spectral clustering and its applications , 2012, Data Mining and Knowledge Discovery.

[38]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Johan A. K. Suykens,et al.  Incremental kernel spectral clustering for online learning of non-stationary data , 2014, Neurocomputing.

[40]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[41]  Jing Peng,et al.  Network community detection based on spectral clustering , 2014, 2014 International Conference on Machine Learning and Cybernetics.

[42]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[44]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[45]  Wenpu Xing,et al.  Weighted PageRank algorithm , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[46]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[47]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.