Vector Quantization Enhancement for Computer Vision Tasks

This paper augments the Bag-of-Word scheme in several respects: we incorporate a category label into the clustering process, build classifier-tailored codebooks, and weight codewords according to their probability to occur. A size-adaptive feature clustering algorithm is also proposed as an alternative to k-means. Experiments on the PASCAL VOC 2007 challenge validate the approach for classical hard-assignment as well as VLAD encoding.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Saturnino Maldonado-Bascón,et al.  Heterogeneous Visual Codebook Integration Via Consensus Clustering for Visual Categorization , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[6]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  David Picard,et al.  Compact tensor based image representation for similarity search , 2012, 2012 19th IEEE International Conference on Image Processing.

[12]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Junsong Yuan,et al.  Combining Feature Context and Spatial Context for Image Pattern Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14]  Limin Wang,et al.  Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.

[15]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[16]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Gang Hua,et al.  Modeling spatial and semantic cues for large-scale near-duplicated image retrieval , 2011, Comput. Vis. Image Underst..

[19]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Bernt Schiele,et al.  Learning semantic object parts for object categorization , 2008, Image Vis. Comput..

[22]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[23]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[24]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Patrick Pérez,et al.  Revisiting the VLAD image representation , 2013, ACM Multimedia.

[26]  Chunheng Wang,et al.  Action Recognition Using Context-Constrained Linear Coding , 2012, IEEE Signal Processing Letters.

[27]  Ramakant Nevatia,et al.  Video segmentation and feature co-occurrences for activity classification , 2014, IEEE Winter Conference on Applications of Computer Vision.

[28]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[30]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[31]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[33]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.