Learning attribute-aware dictionary for image classification and search

Bag-of-visual words (BoW) model has recently been well advocated for image classification and search. However, one critical limitation of existing BoW model is the lack of semantic information. To alleviate the impact of this issue, it is imperative to construct semantic-aware visual dictionary. In this paper, we propose a novel approach for learning visual word dictionary embedding intermediate-level semantics. Specifically, we first introduce an Attribute aware Dictionary Learning(AttrDL) scheme to learn multiple sub-dictionaries with specific semantic meanings. We divide training images into different sets and each represents a specific attribute. For each image set, an attribute-aware sub-vocabulary is learned. Hence, these resulting sub-vocabularies are more discriminative for semantics than the traditional vocabularies. Second, to get semantic-aware and discriminative BoW representation with the learned sub-vocabularies, we adopt the idea of L21-norm regularized sparse coding and recode the resulting sparse representation of each image. Experimental results show that the proposed scheme outperforms the state-of-the-art algorithms in both image classification and search tasks.

[1]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[4]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Samy Bengio,et al.  Group Sparse Coding , 2009, NIPS.

[11]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[12]  Zheng-Jun Zha,et al.  Evaluation of histogram based interest point detector in web image classification and search , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[13]  Meng Wang,et al.  Visual query suggestion , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[14]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[15]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[16]  Mubarak Shah,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[18]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[19]  Qi Tian,et al.  Attribute-assisted reranking for web image retrieval , 2012, ACM Multimedia.

[20]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[21]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[22]  Hervé Le Borgne,et al.  Locality-constrained and spatially regularized coding for scene categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[24]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[25]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[26]  James Hays,et al.  SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yi Yang,et al.  Weakly supervised sparse coding with geometric consistency pooling , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Bingbing Ni,et al.  Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[30]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[31]  Frédéric Jurie,et al.  Improving Image Classification Using Semantic Attributes , 2012, International Journal of Computer Vision.

[32]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[33]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[34]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[35]  Jianping Fan,et al.  Learning inter-related visual dictionary for object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.