Web-Scale Image Clustering Revisited

Large scale duplicate detection, clustering and mining of documents or images has been conventionally treated with seed detection via hashing, followed by seed growing heuristics using fast search. Principled clustering methods, especially kernelized and spectral ones, have higher complexity and are difficult to scale above millions. Under the assumption of documents or images embedded in Euclidean space, we revisit recent advances in approximate k-means variants, and borrow their best ingredients to introduce a new one, inverted-quantized k-means (IQ-means). Key underlying concepts are quantization of data points and multi-index based inverted search from centroids to cells. Its quantization is a form of hashing and analogous to seed detection, while its updates are analogous to seed growing, yet principled in the sense of distortion minimization. We further design a dynamic variant that is able to determine the number of clusters k in a single run at nearly zero additional cost. Combined with powerful deep learned representations, we achieve clustering of a 100 million image collection on a single machine in less than one hour.

[1]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Jian Sun,et al.  Optimized Product Quantization for Approximate Nearest Neighbor Search , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[8]  Wei-keng Liao,et al.  A new scalable parallel DBSCAN algorithm using the disjoint-set data structure , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[10]  Jiri Matas,et al.  Large-Scale Discovery of Spatially Related Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Fei Yang,et al.  Web scale photo hash clustering on a single machine , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[13]  Stefano Soatto,et al.  Quick Shift and Kernel Methods for Mode Seeking , 2008, ECCV.

[14]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Xin Jin,et al.  Mean Shift , 2017, Encyclopedia of Machine Learning and Data Mining.

[16]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Sergei Vassilvitskii,et al.  Scalable K-Means by ranked retrieval , 2014, WSDM.

[18]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[19]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[20]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[22]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Bastian Leibe,et al.  Discovering favorite views of popular places with iconoid shift , 2011, 2011 International Conference on Computer Vision.

[24]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Leonidas J. Guibas,et al.  Image webs: Computing and exploiting connectivity in image collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Andrew Zisserman,et al.  Triangulation Embedding and Democratic Aggregation for Image Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ting Liu,et al.  Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[29]  Victor Lempitsky,et al.  The inverted multi-index , 2012, CVPR.

[30]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[31]  Bastian Leibe,et al.  An Evaluation of Two Automatic Landmark Building Discovery Algorithms for City Reconstruction , 2010, ECCV Workshops.

[32]  Yannis Avrithis,et al.  Approximate Gaussian Mixtures for Large Scale Vocabularies , 2012, ECCV.

[33]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[34]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[36]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[38]  Yannis Avrithis Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Takeo Kanade,et al.  Mode-seeking by Medoidshifts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.