Semi-Supervised Spectral Clustering Using the Signed Laplacian

Data clustering is an important step in numerous real-world problems. The goal is to separate the data into disjoint subgroups (clusters) according to some similarity metric. We consider spectral clustering (SC), where a graph captures the relation between the individual data points and the clusters are obtained from the spectrum of the associated graph Laplacian. We propose a semi-supervised SC scheme that exploits partial knowledge of the true cluster labels. These labels are used to create a modified graph with attractive intra-cluster edges (positive weights) and repulsive inter-cluster edges (negative weights). We then perform spectral clustering using the signed Laplacian matrix of the resulting signed graph. Numerical experiments illustrate the performance improvements achievable with our method.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[3]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[5]  Vassilis Kalofolias,et al.  How to Learn a Graph from Smooth Signals , 2016, AISTATS.

[6]  José M. F. Moura,et al.  Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure , 2014, IEEE Signal Processing Magazine.

[7]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[8]  Gerald Matz,et al.  Graph Learning Based on Total Variation Minimization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[12]  Sahin Albayrak,et al.  Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization , 2010, SDM.

[13]  Pascal Frossard,et al.  Learning Laplacian Matrix in Smooth Graph Signal Representations , 2014, IEEE Transactions on Signal Processing.

[14]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[15]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  Yuji Matsumoto,et al.  Using the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data , 2011, CoNLL.

[17]  Ian Davidson,et al.  On constrained spectral clustering and its applications , 2012, Data Mining and Knowledge Discovery.

[18]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[19]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.