Semantic Enhanced Sketch Based Image Retrieval with Incomplete Multimodal Query

Sketch Based Image Retrieval (SBIR) is a challenging problem mainly due to a significant cross-domain gap between hand-drawn sketches and natural images. While extra semantic information (such as attribute details) can facilitate query-adaptive search, we still have to face two challenges: (1) an incomplete multimodal query; (2) lack of sketch-image paired training data. Toward this end, many existing multimodal sketch retrieval frameworks utilize text-based label information to augment the limited sketch query with more semantic details. However, that single word-level category information may not always reveal sufficient characteristics on the object specific fine-grained attributes. In this work, we propose a multimodal SBIR system that allows both sketch and text level attribute description for query. In order to bridge the cross-modal gaps among sketch, image, and texts, given a semantic in consideration, two mode-specific semantic networks provide layer-wise regularizer parameters to dynamically adopt the underlying semantic within the learned sketch feature representation and thereby transform the initial generic sketch into a more comprehensive Semantic Enhanced Joint Embedding (SEJE). Also the availability of the multimodal paired samples may not be always feasible; neither during training nor during test phases. Therefore, instead of relying on strict one-one cross modal correspondence, the learning of the joint sketch embedding SEJE relies on capturing the semantic relevant cross-modal correspondences between an averaged mode-specific semantic features (image and text) and the sketch feature, which facilitates SEJE’s improved generalization ability. Evaluation on two benchmark datasets: Sketchy and TU-Berlin clearly validates the superiority of the proposed method, compared to the state-of-the-art methods. In fact, by allowing a user to add text attributes to make a complete multimodal query, the proposed method improves the mAP scores by 5 – 7% on the challenging sketch-attribute composition test scenario.

[1]  Changhu Wang,et al.  Indexing billions of images for sketch-based retrieval , 2013, ACM Multimedia.

[2]  Weihong Deng,et al.  Very deep convolutional neural network based image classification using small training sample size , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jose M. Saavedra,et al.  Sketch based image retrieval using a soft computation of the histogram of edge local orientations (S-HELO) , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[6]  Jose M. Saavedra,et al.  Sketch based Image Retrieval using Learned KeyShapes (LKS) , 2015, BMVC.

[7]  Zeynep Akata,et al.  Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David Schlangen,et al.  Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task , 2017, IJCNLP.

[9]  Honggang Zhang,et al.  Sketch-based image retrieval via Siamese convolutional neural network , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[10]  Tao Xiang,et al.  Sketch-a-Net that Beats Humans , 2015, BMVC.

[11]  Ling Shao,et al.  Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Qing Liu,et al.  Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[14]  Mu Zhu,et al.  A Relationship between the Average Precision and the Area Under the ROC Curve , 2015, ICTIR.

[15]  James Hays,et al.  The sketchy database , 2016, ACM Trans. Graph..

[16]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ling Shao,et al.  Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Junsong Yuan,et al.  Query Adaptive Multiview Object Instance Search and Localization Using Sketches , 2018, IEEE Transactions on Multimedia.

[19]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Daniel P. W. Ellis,et al.  Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems , 2015, ArXiv.

[23]  Xin Wang,et al.  TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ryutarou Ohbuchi,et al.  Visual Saliency Weighting and Cross-Domain Manifold Ranking for Sketch-Based Image Retrieval , 2014, MMM.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[27]  Yongxin Yang,et al.  Deep Neural Networks for Sketch Recognition , 2015, ArXiv.

[28]  Nicu Sebe,et al.  Academic Coupled Dictionary Learning for Sketch-based Image Retrieval , 2016, ACM Multimedia.

[29]  Ernest Valveny,et al.  Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Meng Wang,et al.  Enhancing Sketch-Based Image Retrieval by Re-Ranking and Relevance Feedback , 2016, IEEE Transactions on Image Processing.

[32]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[33]  Ling-Yu Duan,et al.  Query-Adaptive Small Object Search Using Object Proposals and Shape-Aware Descriptors , 2016, IEEE Transactions on Multimedia.

[34]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[35]  Ling Shao,et al.  Generative Domain-Migration Hashing for Sketch-to-Image Retrieval , 2018, ECCV.

[36]  Tao Xiang,et al.  Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Anton van den Hengel,et al.  Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Shu Wang,et al.  Sketch-Based Image Retrieval Through Hypothesis-Driven Object Boundary Selection With HLR Descriptor , 2015, IEEE Transactions on Multimedia.

[41]  Xiaochun Cao,et al.  SketchNet: Sketch Classification with Web Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).