Weighted bag of visual words for object recognition

Bag of Visual words (BoV) is one of the most successful strategy for object recognition, used to represent an image as a vector of counts using a learned vocabulary. This strategy assumes that the representation is built using patches that are either densely extracted or sampled from the images using feature detectors. However, the dense strategy captures also the noisy background information, whereas the feature detection strategy can lose important parts of the objects. In this paper we propose a solution in-between these two strategies, by densely extracting patches from the image, and weighting them accordingly to their salience. Intuitively, highly salient patches have an important role in describing an object, while those with low saliency are still taken with low emphasis, instead of discarding them. We embed this idea in the word encoding mechanism adopted in the BoV approaches. The technique is successfully applied to vector quantization and Fisher vector, on Caltech-101 and Caltech-256.

[1]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[2]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Vittorio Murino,et al.  A unifying framework for vector-valued manifold regularization and multi-view learning , 2013, ICML.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Ilmério Reis da Silva,et al.  Spatial Locality Weighting of Features Using Saliency Map with a Bag-of-Visual-Words Approach , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[6]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  John K. Tsotsos,et al.  50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[8]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Andrew W. Fitzgibbon,et al.  PiCoDes: Learning a Compact Code for Novel-Category Recognition , 2011, NIPS.

[12]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[13]  Yoshua Bengio,et al.  Deep Learning of Representations , 2013, Handbook on Neural Information Processing.

[14]  Wen Gao,et al.  Group-sensitive multiple kernel learning for object categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[18]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Tsuhan Chen,et al.  Determining Patch Saliency Using Low-Level Context , 2008, ECCV.

[20]  Gernot A. Fink,et al.  Bag-of-features representations using spatial visual vocabularies for object classification , 2013, 2013 IEEE International Conference on Image Processing.

[21]  Vittorio Murino,et al.  Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[23]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[24]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.