Undoing the codebook bias by linear transformation with sparsity and F-norm constraints for image classification

The bag of visual words model (BoW) and its variants have demonstrated their effectiveness for visual applications. The BoW model first extracts local features and generates the corresponding codebook where the elements of a codebook are viewed as visual words. However, the codebook is dataset dependent and has to be generated for each image dataset. Besides, when we only have a limited number of training images, the codebook generated correspondingly may not be able to encode images well. This requires a lot of computational time and weakens the generalization power of the BoW model. To solve these problems, in this paper, we propose to undo the dataset bias by linear codebook transformation in an unsupervised manner. To represent each point in the local feature space, we need a number of linearly independent basis vectors. We view the codebook as a linear transformation of these basis vectors. In this way, we can transform the pre-learned codebooks for a new dataset using the pseudo-inverse of the transformation matrix. However, this is an under-determined problem which may lead to many solutions. Besides, not all of the visual words are equally important for the new dataset. It would be more effective if we can make some selection and choose the discriminative visual words for transformation. Specifically, the sparsity constraints and the F-norm of the transformation matrix are used in this paper. We propose an alternative optimization algorithm to jointly search for the optimal linear transformation matrixes and the encoding parameters. The proposed method needs no labeled images from either the source dataset or the target dataset. Image classification experimental results on several image datasets show the effectiveness of the proposed method.

[1]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Qi Tian,et al.  Undo the codebook bias by linear transformation for visual applications , 2013, ACM Multimedia.

[3]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[4]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[5]  Giovanni Maria Farinella,et al.  Scene categorization using bag of Textons on spatial hierarchy , 2008, 2008 15th IEEE International Conference on Image Processing.

[6]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[7]  Francesca Bovolo,et al.  Semisupervised One-Class Support Vector Machines for Classification of Remote Sensing Data , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[9]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[12]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[13]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[14]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Qi Tian,et al.  A Boosting, Sparsity- Constrained Bilinear Model for Object Recognition , 2012, IEEE MultiMedia.

[17]  Changsheng Xu,et al.  Building topographic subspace model with transfer learning for sparse representation , 2010, Neurocomputing.

[18]  Lei Zhang,et al.  Image retrieval based on micro-structure descriptor , 2011, Pattern Recognit..

[19]  Changsheng Xu,et al.  User-Aware Image Tag Refinement via Ternary Semantic Analysis , 2012, IEEE Transactions on Multimedia.

[20]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Erik G. Learned-Miller,et al.  Online domain adaptation of a pre-trained cascade of classifiers , 2011, CVPR 2011.

[22]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[23]  Qi Tian,et al.  Image classification using spatial pyramid robust sparse coding , 2013, Pattern Recognit. Lett..

[24]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Cordelia Schmid,et al.  Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Feiping Nie,et al.  Dyadic transfer learning for cross-domain image classification , 2011, 2011 International Conference on Computer Vision.

[28]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[29]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[30]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[31]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[33]  Bin Li,et al.  Transfer Learning by Reusing Structured Knowledge , 2011, AI Mag..

[34]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Qi Tian,et al.  S3MKL: scalable semi-supervised multiple kernel learning for image data mining , 2010, ACM Multimedia.

[36]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[37]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[38]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[39]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[40]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[41]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[42]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .