Clustering Images by Unmasking – A New Baseline

We propose a novel agglomerative clustering method based on unmasking, a technique that was previously used for authorship verification of text documents and for abnormal event detection in videos. In order to join two clusters, we alternate between (i) training a binary classifier to distinguish between the samples from one cluster and the samples from the other cluster, and (ii) removing at each step the most discriminant features. The faster-decreasing accuracy rates of the intermediately-obtained classifiers indicate that the two clusters should be joined. To the best of our knowledge, this is the first work to apply unmasking in order to cluster images. We compare our method with k-means as well as a recent state-of-the-art clustering method. The empirical results indicate that our approach is able to improve performance for various (deep and shallow) feature representations and different tasks, such as handwritten digit recognition, texture classification and fine-grained object recognition.

[1]  Christian Böhm,et al.  Computing Clusters of Correlation Connected objects , 2004, SIGMOD '04.

[2]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[3]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[4]  Radu Tudor Ionescu,et al.  Unmasking the Abnormal Events in Video , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[6]  Jiye Liang,et al.  An Algorithm for Clustering Categorical Data With Set-Valued Features , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[10]  Elke Achtert,et al.  Finding Hierarchies of Subspace Clusters , 2006, PKDD.

[11]  Moshe Koppel,et al.  Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[12]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[13]  Liviu P. Dinu,et al.  Clustering Based on Rank Distance with Applications on DNA , 2012, ICONIP.

[14]  Michal Irani,et al.  “Clustering by Composition”—Unsupervised Discovery of Image Categories , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[17]  Tong Zhang,et al.  Deep Subspace Clustering Networks , 2017, NIPS.

[18]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Liviu P. Dinu,et al.  Clustering based on median and closest string via rank distance with applications on DNA , 2013, Neural Computing and Applications.

[21]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[22]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[23]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[24]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[25]  Feiping Nie,et al.  Rank-Constrained Spectral Clustering With Flexible Embedding , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[27]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Radu Tudor Ionescu,et al.  PQ kernel: A rank correlation kernel for visual word histograms , 2015, Pattern Recognit. Lett..

[31]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[32]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.