Exploiting Concept Correlation with Attributes for Semantic Binary Representation Learning

Recent years have witnessed the unprecedented efforts of visual representation for enabling various efficient and effective multimedia applications. In this paper, we propose a novel visual representation framework, which generates efficient semantic hash codes for visual samples by substantially exploring concepts, semantic attributes as well as their inter-correlations. Specifically, we construct a conceptual space, where the semantic knowledge of concepts and attributes is embedded. Then, we develop an effective on-line feature coding scheme for visual objects by leveraging the inter-concept relationships through the intermediate representative power of attributes. The code process is formulated as an overlapping group lasso problem, which can be efficiently solved. Finally, we binarize the visual representation to generate efficient hash codes. Extensive experiments have illustrated the superiority of our proposed framework on visual retrieval task as compared to state-of-the-art methods.

[1]  Yang Yang,et al.  Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.

[2]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[3]  Meng Wang,et al.  Multimedia answering: enriching text QA with media information , 2011, SIGIR.

[4]  Zi Huang,et al.  Tag localization with spatial correlations and joint group sparsity , 2011, CVPR 2011.

[5]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[6]  Xuelong Li,et al.  Robust Web Image Annotation via Exploring Multi-Facet and Structural Knowledge , 2017, IEEE Transactions on Image Processing.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[9]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[10]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[11]  David Suter,et al.  Fast Supervised Hashing with Decision Trees for High-Dimensional Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[14]  David A. Forsyth,et al.  Describing objects by their attributes , 2009, CVPR.

[15]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Huimin Lu,et al.  Learning unified binary codes for cross-modal retrieval via latent semantic hashing , 2016, Neurocomputing.

[17]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[18]  Yue Gao,et al.  Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss , 2014, IEEE Transactions on Multimedia.

[19]  Timothy K. Shih,et al.  Distributed Multimedia Databases: Techniques and Applications , 2001 .

[20]  Xuelong Li,et al.  Visual Coding in a Semantic Hierarchy , 2015, ACM Multimedia.

[21]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[22]  Zhi-Hua Zhou,et al.  Column Sampling Based Discrete Supervised Hashing , 2016, AAAI.