Image Representation with Bag-of-Words

Image classification, which is to assign one or more category labels to an image, is a very hot topic in computer vision and pattern recognition. It can be applied in video surveillance, remote sensing, web content analysis, biometrics, etc. Many successful models transform low-level descriptors into richer mid-level representations. Extracting mid-level features involves a sequence of interchangeable modules. However, they always consist of two major parts: Bag-of-Words (BoW) and Spatial Pyramid Matching (SPM). The target is to embed low-level descriptors in a representative codebook space.First of all, low-level descriptors are firstly extracted at interest points or in dense grids. Then, a pre-defined codebook is applied to encode each descriptor using a specific coding scheme. The code is normally a vector with binary or continuous elements depends on coding scheme, which can be referred as mid-level descriptor. Next, the image is divided into increasingly finer spatial subregions. Multiple codes from each subregion are pooled together by averaging or normalizing into a histogram. Finally, the final image representation is generated by concatenating the histograms from all subregions together. In this chapter, we introduce the key techniques employed in the BoW framework including SPM, which are coding process and pooling process.

[1]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[4]  LinLin Shen,et al.  HEp-2 image classification using intensity order pooling based features and bag of words , 2014, Pattern Recognit..

[5]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[10]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[11]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Yongkang Wong,et al.  Automatic Classification of Human Epithelial Type 2 Cell Indirect Immunofluorescence Images using Cell Pyramid Matching , 2014, bioRxiv.

[15]  Larry S. Davis,et al.  Submodular dictionary learning for sparse coding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[17]  Tieniu Tan,et al.  Salient coding for image classification , 2011, CVPR 2011.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[20]  Tieniu Tan,et al.  Group encoding of local features in image classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).