Creating efficient codebooks for visual recognition

Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for texture analysis and scene classification. Codebooks are usually constructed by using a method such as k-means to cluster the descriptor vectors of patches sampled either densely ('textons') or sparsely ('bags of features' based on key-points or salience measures) from a set of training images. This works well for texture analysis in homogeneous images, but the images that arise in natural object recognition tasks have far less uniform statistics. We show that for dense sampling, k-means over-adapts to this, clustering centres almost exclusively around the densest few regions in descriptor space and thus failing to code other informative regions. This gives suboptimal codes that are no better than using randomly selected centres. We describe a scalable acceptance-radius based clusterer that generates better codebooks and study its performance on several image classification tasks. We also show that dense representations outperform equivalent keypoint based ones on these tasks and that SVM or mutual information based feature selection starting from a dense codebook further improves the performance.

[1]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[2]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[3]  D. Geman,et al.  Invariant Statistics and Coding of Natural Microimages , 1998 .

[4]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[5]  Adam Meyerson,et al.  Online facility location , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[6]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Marko Grobelnik,et al.  Interaction of Feature Selection Methods and Linear Classification Models , 2002 .

[8]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[9]  B. Schiele,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[10]  Tomaso A. Poggio,et al.  Face recognition: component-based versus global approaches , 2003, Comput. Vis. Image Underst..

[11]  Constance de Koning,et al.  Editors , 2003, Annals of Emergency Medicine.

[12]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Ilan Shimshoni,et al.  Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[17]  Adam Meyerson,et al.  A k-Median Algorithm with Running Time Independent of Data Size , 2004, Machine Learning.

[18]  A. Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  David Mumford,et al.  Occlusion Models for Natural Images: A Statistical Study of a Scale-Invariant Dead Leaves Model , 2004, International Journal of Computer Vision.

[21]  C. Schmid,et al.  Scale-invariant shape features for recognition of object categories , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[22]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[23]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[24]  Dan Roth,et al.  Learning to detect objects in images via a sparse, part-based representation , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[26]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[27]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Song-Chun Zhu,et al.  What are Textons? , 2005, Int. J. Comput. Vis..