DLA-Net for FG-SBIR: Dynamic Local Aligned Network for Fine-Grained Sketch-Based Image Retrieval

Fine-grained sketch-based image retrieval is considered as an ideal alternative to keyword-based image retrieval and image search by image due to the rich and easily accessible characteristics of sketches. Previous works always follow a paradigm that first extracting image global feature with convolution neural network and then optimizing the model with triplet loss. Many efforts on narrowing the domain gap and extracting discriminating features are made by these works. However, they ignored that the global feature is not good at capturing fine-grained details. In this paper, we emphasize the local features are more discriminating than global feature in FG-SBIR and explore an effective way to utilize local features. Specifically, Local Aligned Network (LA-Net) is proposed first, which solves FG-SBIR by directly aligning the mid-level local features. Experiment manifests it can beat all previous baselines and is easy to implement. LA-Net is hoped to be a new strong baseline for FG-SBIR. Next, Dynamic Local Aligned Network (DLA-Net) is proposed to enhance LA-Net. The question of spatial misalignment caused by the abstraction of the sketch is not considered by LA-Net. To solve this question, a dynamic alignment mechanism is introduced into LA-Net. This new mechanism makes the sketch interact with the photo and dynamically decide where to align according to the different photos. The Experiment indicates DLA-Net successfully addresses the question of spatial misalignment. It gains a significant performance boost over LA-Net and outperforms the state-of-the-art in FG-SBIR. To the best of our knowledge, DLA-Net is the first model that beats humans on all datasets---QMUL FG-SBIR, QMUL Handbag, and Sketchy.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Marc Alexa,et al.  Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors , 2011, IEEE Transactions on Visualization and Computer Graphics.

[3]  Ling Shao,et al.  Zero-Shot Sketch-Image Hashing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Tao Xiang,et al.  Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval , 2020, BMVC.

[6]  Jun Guo,et al.  SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Qi Jia Coupling Deep Textural and Shape Features for Sketch Recognition , 2020 .

[8]  Kun Liu,et al.  Fine-Grained Instance-Level Sketch-Based Image Retrieval , 2020, International Journal of Computer Vision.

[9]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[10]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kun Zhou,et al.  SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks , 2020, ArXiv.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Ondrej Chum,et al.  Deep Shape Matching , 2017, ECCV.

[15]  Xiang Bai,et al.  Deep sketch feature for cross-domain image retrieval , 2016, Neurocomputing.

[16]  Rui Hu,et al.  A performance evaluation of gradient field HOG descriptor for sketch based image retrieval , 2013, Comput. Vis. Image Underst..

[17]  James Hays,et al.  SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[19]  Ya Zhang,et al.  Part-Stacked CNN for Fine-Grained Visual Categorization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shaogang Gong,et al.  TC-Net for iSBIR: Triplet Classification Network for Instance-level Sketch Based Image Retrieval , 2019, ACM Multimedia.

[21]  Timothy M. Hospedales,et al.  Cross-domain Generative Learning for Fine-Grained Sketch-Based Image Retrieval , 2017, BMVC.

[22]  Toshikazu Kato,et al.  A sketch retrieval method for full color image database-query by visual example , 1992, [1992] Proceedings. 11th IAPR International Conference on Pattern Recognition.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Tao Xiang,et al.  Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Rui Hu,et al.  Gradient field descriptor for sketch based retrieval and localization , 2010, 2010 IEEE International Conference on Image Processing.

[26]  Tao Xiang,et al.  SketchyScene: Richly-Annotated Scene Sketches , 2018, ECCV.

[27]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[28]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[29]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Xuejin Chen,et al.  DeepFacePencil: Creating Face Images from Freehand Sketches , 2020, ACM Multimedia.

[31]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[33]  Josep Lladós,et al.  Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tao Xiang,et al.  Sketch-a-Net: A Deep Neural Network that Beats Humans , 2017, International Journal of Computer Vision.

[35]  Qian Yu,et al.  Unsupervised Sketch to Photo Synthesis , 2020, ECCV.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Tao Xiang,et al.  Generalising Fine-Grained Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Leo Sampaio Ferraz Ribeiro,et al.  Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression , 2018, Comput. Graph..

[39]  Tao Xiang,et al.  Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Shaogang Gong,et al.  Intra-category sketch-based image retrieval by matching deformable part models , 2014, BMVC.

[41]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.