论文信息 - Visual Word Aggregation

Visual Word Aggregation

Most recent category-level object recognition systems work with visual words, i.e. vector quantized local descriptors. These visual vocabularies are usually constructed by using a single method such as K-means for clustering the descriptor vectors of patches sampled either densely or sparsely from a set of training images. Instead, in this paper we propose a novel methodology for building efficient codebooks for visual recognition using clustering aggregation techniques: the Visual Word Aggregation (VWA). Our aim is threefold: to increase the stability of the visual vocabulary construction process; to increase the image classification rate; and also to automatically determine the size of the visual codebook. Results on image classification are presented on the testbed PASCAL VOC Challenge 2007.

[1] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[2] Bernt Schiele,et al. Efficient Clustering and Matching for Object Class Recognition , 2006, BMVC.

[3] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4] Avrim Blum,et al. Correlation Clustering , 2004, Machine Learning.

[5] Ying Wu,et al. Context-aware clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Arindam Banerjee,et al. Bayesian cluster ensembles , 2011, Stat. Anal. Data Min..

[7] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Frédéric Jurie,et al. Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[9] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Carla E. Brodley,et al. Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[11] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Aristides Gionis,et al. Clustering Aggregation , 2005, ICDE.

[13] Cordelia Schmid,et al. A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[14] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15] Tinne Tuytelaars,et al. Dense interest points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[17] Luc Van Gool,et al. Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18] Frédéric Jurie,et al. Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[21] Dimitrios Gunopulos,et al. A clustering framework based on subjective and objective validity criteria , 2008, TKDD.

[22] Tinne Tuytelaars,et al. Towards a more discriminative and semantic visual vocabulary , 2011, Comput. Vis. Image Underst..