Undo the codebook bias by linear transformation for visual applications

The bag of visual words model (BoW) and its variants have demonstrate their effectiveness for visual applications and have been widely used by researchers. The BoW model first extracts local features and generates the corresponding codebook, the elements of a codebook are viewed as visual words. The local features within each image are then encoded to get the final histogram representation. However, the codebook is dataset dependent and has to be generated for each image dataset. This costs a lot of computational time and weakens the generalization power of the BoW model. To solve these problems, in this paper, we propose to undo the dataset bias by codebook linear transformation. To represent every points within the local feature space using Euclidean distance, the number of bases should be no less than the space dimensions. Hence, each codebook can be viewed as a linear transformation of these bases. In this way, we can transform the pre-learned codebooks for a new dataset. However, not all of the visual words are equally important for the new dataset, it would be more effective if we can make some selection using sparsity constraints and choose the most discriminative visual words for transformation. We propose an alternative optimization algorithm to jointly search for the optimal linear transformation matrixes and the encoding parameters. Image classification experimental results on several image datasets show the effectiveness of the proposed method.

[1]  Qi Tian,et al.  Image classification using Harr-like transformation of local features with coding residuals , 2013, Signal Process..

[2]  Erik G. Learned-Miller,et al.  Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[3]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[4]  Qi Tian,et al.  Image classification using spatial pyramid robust sparse coding , 2013, Pattern Recognit. Lett..

[5]  Qi Tian,et al.  Laplacian affine sparse coding with tilt and orientation consistency for image classification , 2013, J. Vis. Commun. Image Represent..

[6]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Feiping Nie,et al.  Dyadic transfer learning for cross-domain image classification , 2011, 2011 International Conference on Computer Vision.

[8]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[9]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[10]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[11]  Qi Tian,et al.  Image Classification Using Spatial Pyramid Coding and Visual Word Reweighting , 2010, ACCV.

[12]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[13]  Lei Zhang,et al.  Image retrieval based on micro-structure descriptor , 2011, Pattern Recognit..

[14]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[15]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[16]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[17]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[19]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[20]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[21]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Qi Tian,et al.  A Boosting, Sparsity- Constrained Bilinear Model for Object Recognition , 2012, IEEE MultiMedia.

[24]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[25]  Qi Tian,et al.  Beyond visual features: A weak semantic image representation using exemplar classifiers for classification , 2013, Neurocomputing.