Deep learning vs spectral clustering into an active clustering with pairwise constraints propagation

In our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, what is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Jason J. Corso,et al.  Active Clustering with Model-Based Uncertainty Reduction , 2014, IEEE transactions on pattern analysis and machine intelligence.

[3]  Patrick Lambert,et al.  An in-depth evaluation of multimodal video genre categorization , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[4]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[5]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[6]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[7]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[8]  M. Cugmas,et al.  On comparing partitions , 2015 .

[9]  Zhenguo Li,et al.  Constrained clustering via spectral regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Charles A. Micchelli,et al.  On Spectral Learning , 2010, J. Mach. Learn. Res..

[13]  Martha Larson,et al.  Blip10000: a social video dataset containing SPUG content for tagging and retrieval , 2013, MMSys.

[14]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[15]  Jorge E. Camargo,et al.  Visualization , Summarization and Exploration of Large Collections of Images : State Of The Art , 2009 .

[16]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[17]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[18]  Patrick Lambert,et al.  Semi-supervised spectral clustering with automatic propagation of pairwise constraints , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[19]  Matthias Hein,et al.  Constrained 1-Spectral Clustering , 2012, AISTATS.

[20]  Dale Schuurmans,et al.  Fast normalized cut with linear constraints , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.