StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

Sketch-based image retrieval (SBIR) is a cross-modal matching problem which is typically solved by learning a joint embedding space where the semantic content shared between photo and sketch modalities are preserved. However, a fundamental challenge in SBIR has been largely ignored so far, that is, sketches are drawn by humans and considerable style variations exist amongst different users. An effective SBIR model needs to explicitly account for this style diversity, crucially, to generalise to unseen user styles. To this end, a novel style-agnostic SBIR model is proposed. Different from existing models, a cross-modal variational autoencoder (VAE) is employed to explicitly disentangle each sketch into a semantic content part shared with the corresponding photo, and a style part unique to the sketcher. Importantly, to make our model dynamically adaptable to any unseen user styles, we propose to meta-train our cross-modal VAE by adding two style-adaptive components: a set of feature transformation layers to its encoder and a regulariser to the disentangled semantic content latent code. With this meta-learning framework, our model can not only disentangle the cross-modal shared semantic content for SBIR, but can adapt the disentanglement to any unseen user style as well, making the SBIR model truly style-agnostic. Extensive experiments show that our style-agnostic model yields state-of-the-art performance for both category-level and instance-level SBIR.

[1]  Haque Ishfaq,et al.  TVAE: Triplet-Based Variational Autoencoder using Metric Learning , 2018, ArXiv.

[2]  Timothy M. Hospedales,et al.  Fine-grained sketch-based image retrieval by matching deformable part models , 2018 .

[3]  Ersin Yumer,et al.  Real-Time Hair Rendering Using Sequential Adversarial Networks , 2018, ECCV.

[4]  Anurag Mittal,et al.  A Zero-Shot Framework for Sketch-based Image Retrieval , 2018, ECCV.

[5]  Timothy M. Hospedales,et al.  Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma , 2017, BMVC.

[6]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Dimitris N. Metaxas,et al.  Reconstruction-Based Disentanglement for Pose-Invariant Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Hao Wang,et al.  Orthogonal Deep Features Decomposition for Age-Invariant Face Recognition , 2018, ECCV.

[9]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[10]  Hailin Jin,et al.  LiveSketch: Query Perturbations for Guided Sketch-Based Visual Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Björn Ommer,et al.  Content and Style Disentanglement for Artistic Style Transfer , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jose M. Saavedra,et al.  Sketch based Image Retrieval using Learned KeyShapes (LKS) , 2015, BMVC.

[14]  Tao Xiang,et al.  Goal-Driven Sequential Data Abstraction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Tao Xiang,et al.  Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval , 2020, BMVC.

[16]  Hung-Yu Tseng,et al.  Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation , 2020, ICLR.

[17]  Tao Xiang,et al.  Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[19]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[20]  Jose M. Saavedra,et al.  Sketch based image retrieval using a soft computation of the histogram of edge local orientations (S-HELO) , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[21]  Otmar Hilliges,et al.  Cross-Modal Deep Variational Hand Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[23]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Swami Sankaranarayanan,et al.  MetaReg: Towards Domain Generalization using Meta-Regularization , 2018, NeurIPS.

[25]  Ling Shao,et al.  Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Tao Xiang,et al.  Sketch Less for More: On-the-Fly Fine-Grained Sketch-Based Image Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tao Xiang,et al.  More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xudong Lin,et al.  Deep Variational Metric Learning , 2018, ECCV.

[29]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[30]  Jun Guo,et al.  SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[32]  Tao Xiang,et al.  Making better use of edges via perceptual grouping , 2015, CVPR.

[33]  Tao Xiang,et al.  Learning to Sketch with Shortcut Cycle Consistency , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Zeynep Akata,et al.  Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hu Han,et al.  Video-based Remote Physiological Measurement via Cross-verified Feature Disentangling , 2020, ECCV.

[36]  Josep Lladós,et al.  Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[38]  Tao Xiang,et al.  Sketch-a-Net: A Deep Neural Network that Beats Humans , 2017, International Journal of Computer Vision.

[39]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[40]  Juyong Zhang,et al.  Disentangled Representation Learning for 3D Face Shape , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[42]  Jeffrey Nichols,et al.  Swire: Sketch-based User Interface Retrieval , 2019, CHI.

[43]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[46]  Ondrej Chum,et al.  Asymmetric Feature Maps with Application to Sketch Based Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ling Shao,et al.  Generative Domain-Migration Hashing for Sketch-to-Image Retrieval , 2018, ECCV.

[48]  Timothy M. Hospedales,et al.  Cross-domain Generative Learning for Fine-Grained Sketch-Based Image Retrieval , 2017, BMVC.

[49]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[50]  Tao Xiang,et al.  Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[52]  Zhuowen Tu,et al.  Guided Variational Autoencoder for Disentanglement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[54]  Tao Xiang,et al.  Generalising Fine-Grained Sketch-Based Image Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Tao Xiang,et al.  Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Angela Yao,et al.  Disentangling Latent Hands for Image Synthesis and Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Hailin Jin,et al.  Sketching with Style: Visual Search with Sketches and Aesthetic Context , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Xiaochun Cao,et al.  SketchNet: Sketch Classification with Web Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Rui Hu,et al.  A performance evaluation of gradient field HOG descriptor for sketch based image retrieval , 2013, Comput. Vis. Image Underst..

[60]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[61]  Yang Zou,et al.  Joint Disentangling and Adaptation for Cross-Domain Person Re-Identification , 2020, ECCV.