Cartesian K-Means

A fundamental limitation of quantization techniques like the k-means clustering algorithm is the storage and run-time cost associated with the large numbers of clusters required to keep quantization errors small and model fidelity high. We develop new models with a compositional parameterization of cluster centers, so representational capacity increases super-linearly in the number of parameters. This allows one to effectively quantize data using billions or trillions of centers. We formulate two such models, Orthogonal k-means and Cartesian k-means. They are closely related to one another, to k-means, to methods for binary hash function optimization like ITQ (Gong and Lazebnik, 2011), and to Product Quantization for vector quantization (Jegou et al., 2011). The models are tested on large-scale ANN retrieval tasks (1M GIST, 1B SIFT features), and on codebook learning for object recognition (CIFAR-10).

[1]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[2]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[3]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[14]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[15]  Svetlana Lazebnik,et al.  Asymmetric Distances for Binary Embeddings , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jian Sun,et al.  Optimized Product Quantization for Approximate Nearest Neighbor Search , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Dong Xu,et al.  Partitioned K-Means Clustering for Fast Construction of Unbiased Visual Vocabulary , 2013 .