论文信息 - Vector Quantization Enhancement for Computer Vision Tasks

Vector Quantization Enhancement for Computer Vision Tasks

This paper augments the Bag-of-Word scheme in several respects: we incorporate a category label into the clustering process, build classifier-tailored codebooks, and weight codewords according to their probability to occur. A size-adaptive feature clustering algorithm is also proposed as an alternative to k-means. Experiments on the PASCAL VOC 2007 challenge validate the approach for classical hard-assignment as well as VLAD encoding.

Noel E. O'Connor | Rémi Trichet | N. O'Connor | Rémi Trichet

[1] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .

[2] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4] Saturnino Maldonado-Bascón,et al. Heterogeneous Visual Codebook Integration Via Consensus Clustering for Visual Categorization , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[5] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[6] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[9] Tao Mei,et al. Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[10] Adriana Kovashka,et al. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] David Picard,et al. Compact tensor based image representation for similarity search , 2012, 2012 19th IEEE International Conference on Image Processing.

[12] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Junsong Yuan,et al. Combining Feature Context and Spatial Context for Image Pattern Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14] Limin Wang,et al. Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.

[15] Thomas S. Huang,et al. Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[16] Antonio Criminisi,et al. Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Gang Hua,et al. Modeling spatial and semantic cues for large-scale near-duplicated image retrieval , 2011, Comput. Vis. Image Underst..

[19] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21] Bernt Schiele,et al. Learning semantic object parts for object categorization , 2008, Image Vis. Comput..

[22] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[23] Frédéric Jurie,et al. Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[24] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25] Patrick Pérez,et al. Revisiting the VLAD image representation , 2013, ACM Multimedia.

[26] Chunheng Wang,et al. Action Recognition Using Context-Constrained Linear Coding , 2012, IEEE Signal Processing Letters.

[27] Ramakant Nevatia,et al. Video segmentation and feature co-occurrences for activity classification , 2014, IEEE Winter Conference on Applications of Computer Vision.

[28] Frédéric Jurie,et al. Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[30] Yang Yang,et al. Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[31] Rong Jin,et al. Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Andrew Zisserman,et al. The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[33] Cordelia Schmid,et al. A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[34] Cordelia Schmid,et al. A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.