Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax

This paper proposes a novel approach for Sketch-Based Image Retrieval (SBIR), for which the key is to bridge the gap between sketches and photos in terms of the data representation. Inspired by channel-wise attention explored in recent years, we present a Domain-Aware Squeeze-and-Excitation (DASE) network, which seamlessly incorporates the prior knowledge of sample sketch or photo into SE module and make the SE module capable of emphasizing appropriate channels according to domain signal. Accordingly, the proposed network can switch its mode to achieve a better domain feature with lower intra-class discrepancy. Moreover, while previous works simply focus on minimizing intra-class distance and maximizing inter-class distance, we introduce a loss function, named Multiplicative Euclidean Margin Softmax (MEMS), which introduces multiplicative Euclidean margin into feature space and ensure that the maximum intra-class distance is smaller than the minimum inter-class distance. This facilitates learning a highly discriminative feature space and ensures a more accurate image retrieval result. Extensive experiments are conducted on two widely used SBIR benchmark datasets. Our approach achieves better results on both datasets, surpassing the state-of-the-art methods by a large margin.

[1]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ling Shao,et al.  Zero-Shot Sketch-Image Hashing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Shengcai Liao,et al.  Person re-identification by Local Maximal Occurrence representation and metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rui Hu,et al.  Gradient field descriptor for sketch based retrieval and localization , 2010, 2010 IEEE International Conference on Image Processing.

[5]  Xiaochun Cao,et al.  SketchNet: Sketch Classification with Web Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[9]  Meng Yang,et al.  Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[10]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Anurag Mittal,et al.  A Zero-Shot Framework for Sketch-based Image Retrieval , 2018, ECCV.

[12]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[13]  Jose M. Saavedra,et al.  Sketch based image retrieval using a soft computation of the histogram of edge local orientations (S-HELO) , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[14]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[15]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ling Shao,et al.  Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[19]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[20]  Huawen Liu,et al.  Regularized partial least squares for multi-label learning , 2018, Int. J. Mach. Learn. Cybern..

[21]  Honggang Zhang,et al.  Sketch-based image retrieval via Siamese convolutional neural network , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[22]  Tao Xiang,et al.  Sketch-a-Net: A Deep Neural Network that Beats Humans , 2017, International Journal of Computer Vision.

[23]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Ignacio Santamaría,et al.  Canonical correlation analysis (CCA) algorithms for multiple data sets: Application to blind SIMO equalization , 2005, 2005 13th European Signal Processing Conference.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Gang Wang,et al.  Multi-manifold deep metric learning for image set classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Yin Li,et al.  Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Li Niu,et al.  Bi-Directional Domain Translation for Zero-Shot Sketch-Based Image Retrieval , 2019, ArXiv.

[31]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[32]  Hema A. Murthy,et al.  Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Toshikazu Kato,et al.  A sketch retrieval method for full color image database-query by visual example , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[35]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jose M. Saavedra,et al.  Sketch based Image Retrieval using Learned KeyShapes (LKS) , 2015, BMVC.

[37]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Tao Xiang,et al.  Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Yuxin Peng,et al.  Cross-View Feature Learning for Scalable Social Image Analysis , 2014, AAAI.

[40]  Shaogang Gong,et al.  TC-Net for iSBIR: Triplet Classification Network for Instance-level Sketch Based Image Retrieval , 2019, ACM Multimedia.

[41]  Ling Shao,et al.  Generative Domain-Migration Hashing for Sketch-to-Image Retrieval , 2018, ECCV.

[42]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[43]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[44]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[45]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Gang Wang,et al.  Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[48]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.