论文信息 - Semantic Enhanced Sketch Based Image Retrieval with Incomplete Multimodal Query

Semantic Enhanced Sketch Based Image Retrieval with Incomplete Multimodal Query

Sketch Based Image Retrieval (SBIR) is a challenging problem mainly due to a significant cross-domain gap between hand-drawn sketches and natural images. While extra semantic information (such as attribute details) can facilitate query-adaptive search, we still have to face two challenges: (1) an incomplete multimodal query; (2) lack of sketch-image paired training data. Toward this end, many existing multimodal sketch retrieval frameworks utilize text-based label information to augment the limited sketch query with more semantic details. However, that single word-level category information may not always reveal sufficient characteristics on the object specific fine-grained attributes. In this work, we propose a multimodal SBIR system that allows both sketch and text level attribute description for query. In order to bridge the cross-modal gaps among sketch, image, and texts, given a semantic in consideration, two mode-specific semantic networks provide layer-wise regularizer parameters to dynamically adopt the underlying semantic within the learned sketch feature representation and thereby transform the initial generic sketch into a more comprehensive Semantic Enhanced Joint Embedding (SEJE). Also the availability of the multimodal paired samples may not be always feasible; neither during training nor during test phases. Therefore, instead of relying on strict one-one cross modal correspondence, the learning of the joint sketch embedding SEJE relies on capturing the semantic relevant cross-modal correspondences between an averaged mode-specific semantic features (image and text) and the sketch feature, which facilitates SEJE’s improved generalization ability. Evaluation on two benchmark datasets: Sketchy and TU-Berlin clearly validates the superiority of the proposed method, compared to the state-of-the-art methods. In fact, by allowing a user to add text attributes to make a complete multimodal query, the proposed method improves the mAP scores by 5 – 7% on the challenging sketch-attribute composition test scenario.

Junsong Yuan | Sreyasee Das Bhattacharjee | Junsong Yuan | S. Bhattacharjee

[1] Changhu Wang,et al. Indexing billions of images for sketch-based retrieval , 2013, ACM Multimedia.

[2] Weihong Deng,et al. Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Jose M. Saavedra,et al. Sketch based image retrieval using a soft computation of the histogram of edge local orientations (S-HELO) , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5] Luca Bertinetto,et al. Learning feed-forward one-shot learners , 2016, NIPS.

[6] Jose M. Saavedra,et al. Sketch based Image Retrieval using Learned KeyShapes (LKS) , 2015, BMVC.

[7] Zeynep Akata,et al. Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] David Schlangen,et al. Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task , 2017, IJCNLP.

[9] Honggang Zhang,et al. Sketch-based image retrieval via Siamese convolutional neural network , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[10] Tao Xiang,et al. Sketch-a-Net that Beats Humans , 2015, BMVC.

[11] Ling Shao,et al. Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[12] Qing Liu,et al. Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[14] Mu Zhu,et al. A Relationship between the Average Precision and the Area Under the ROC Curve , 2015, ICTIR.

[15] James Hays,et al. The sketchy database , 2016, ACM Trans. Graph..

[16] Feng Liu,et al. Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ling Shao,et al. Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Junsong Yuan,et al. Query Adaptive Multiview Object Instance Search and Localization Using Sketches , 2018, IEEE Transactions on Multimedia.

[19] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[21] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22] Daniel P. W. Ellis,et al. Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems , 2015, ArXiv.

[23] Xin Wang,et al. TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Ryutarou Ohbuchi,et al. Visual Saliency Weighting and Cross-Domain Manifold Ranking for Sketch-Based Image Retrieval , 2014, MMM.

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Marc Alexa,et al. How do humans sketch objects? , 2012, ACM Trans. Graph..

[27] Yongxin Yang,et al. Deep Neural Networks for Sketch Recognition , 2015, ArXiv.

[28] Nicu Sebe,et al. Academic Coupled Dictionary Learning for Sketch-based Image Retrieval , 2016, ACM Multimedia.

[29] Ernest Valveny,et al. Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[30] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31] Meng Wang,et al. Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback , 2016, IEEE Transactions on Image Processing.

[32] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[33] Ling-Yu Duan,et al. Query-Adaptive Small Object Search Using Object Proposals and Shape-Aware Descriptors , 2016, IEEE Transactions on Multimedia.

[34] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[35] Ling Shao,et al. Generative Domain-Migration Hashing for Sketch-to-Image Retrieval , 2018, ECCV.

[36] Tao Xiang,et al. Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Fang Wang,et al. Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Anton van den Hengel,et al. Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Shu Wang,et al. Sketch-Based Image Retrieval Through Hypothesis-Driven Object Boundary Selection With HLR Descriptor , 2015, IEEE Transactions on Multimedia.

[41] Xiaochun Cao,et al. SketchNet: Sketch Classification with Web Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).