Semi-supervised spectral clustering with automatic propagation of pairwise constraints

In our data driven world, clustering is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on ground truth to perform the classification and are usually subject to overtraining issues. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, unsupervised learning tends to provide inferior results to supervised learning. A compromise is then to use learning only for some of the ambiguous classes, in order to boost performances. In this context, this paper studies the impact of pairwise constraints to unsupervised Spectral Clustering. We introduce a new generalization of constraint propagation which maximizes partitioning quality while reducing annotation costs. Experiments show the efficiency of the proposed scheme.

[1]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[2]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[3]  Martha Larson,et al.  Blip10000: a social video dataset containing SPUG content for tagging and retrieval , 2013, MMSys.

[4]  Jorge E. Camargo,et al.  Visualization , Summarization and Exploration of Large Collections of Images : State Of The Art , 2009 .

[5]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[6]  Patrick Lambert,et al.  Automatic difference measure between movies using dissimilarity measure fusion and rank correlation coefficients , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Bernadette Bouchon-Meunier,et al.  Improving constrained clustering with active query selection , 2012, Pattern Recognit..

[9]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[10]  Dan Klein,et al.  Spectral Learning , 2003, IJCAI.

[11]  L. Hubert,et al.  Comparing partitions , 1985 .

[12]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[13]  Zhenguo Li,et al.  Constrained clustering via spectral regularization , 2009, CVPR.

[14]  Jason J. Corso,et al.  Active Clustering with Model-Based Uncertainty Reduction , 2014, IEEE transactions on pattern analysis and machine intelligence.

[15]  Patrick Lambert,et al.  An in-depth evaluation of multimodal video genre categorization , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[16]  Matthias Hein,et al.  Constrained 1-Spectral Clustering , 2012, AISTATS.

[17]  Dale Schuurmans,et al.  Fast normalized cut with linear constraints , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Zhenguo Li,et al.  Constrained clustering via spectral regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.