Adversarial Learning for Fine-Grained Image Search

Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories. In practice, a dynamic inventory with new fine-grained categories adds another dimension to this challenge. In this work, we propose an end-to-end network, called FGGAN, that learns discriminative representations by implicitly learning a geometric transformation from multi-view images for fine-grained rigid object retrieval. We integrate a generative adversarial network (GAN) that can automatically handle complex view and pose variations by converting them to a canonical view without any predefined transformations. Moreover, in an open-set scenario, our network is able to better match rigid objects from unseen and unknown fine-grained categories. Extensive experiments on the public CompCars dataset and a newly collected dataset have demonstrated the effectiveness of the proposed method in both closed-set and open-set scenarios.

[1]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[2]  Feng Zhou,et al.  Fine-Grained Image Classification by Exploring Bipartite-Graph Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[4]  Qiaosong Wang,et al.  Visual Search at eBay , 2017, KDD.

[5]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Yuxin Peng,et al.  Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN , 2017, ACM Multimedia.

[13]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Tao Xiang,et al.  Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[19]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.

[20]  Andrew Zisserman,et al.  Symbiotic Segmentation and Part Localization for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Jonathan Krause,et al.  Fine-grained recognition without part annotations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Qi Tian,et al.  Fine-Grained Image Search , 2015, IEEE Transactions on Multimedia.

[23]  Jingkuan Song,et al.  Binary Generative Adversarial Networks for Image Retrieval , 2017, AAAI.

[24]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[25]  Anil A. Bharath,et al.  Adversarial Training for Sketch Retrieval , 2016, ECCV Workshops.

[26]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[27]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[28]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[29]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[30]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Rong Jin,et al.  Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jonathan Krause,et al.  Learning Features and Parts for Fine-Grained Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[33]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[34]  Yong Cheng,et al.  Deep Multimodal Embedding Model for Fine-grained Sketch-based Image Retrieval , 2017, SIGIR.

[35]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[36]  Xiao Liu,et al.  Kernel Pooling for Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Namil Kim,et al.  Pixel-Level Domain Transfer , 2016, ECCV.

[38]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[40]  Peng Zhang,et al.  IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models , 2017, SIGIR.

[41]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yoshua Bengio,et al.  Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yuxin Peng,et al.  Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification , 2017, AAAI.

[45]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Xiaoou Tang,et al.  A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Tianbao Yang,et al.  Hyper-class augmented and regularized deep learning for fine-grained image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jeff Donahue,et al.  Visual Search at Pinterest , 2015, KDD.

[49]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[50]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Yongdong Zhang,et al.  One-Shot Fine-Grained Instance Retrieval , 2017, ACM Multimedia.

[52]  Tao Mei,et al.  Deep Semantic Hashing with Generative Adversarial Networks , 2017, SIGIR.

[53]  Kaiqi Huang,et al.  GP-GAN: Towards Realistic High-Resolution Image Blending , 2017, ACM Multimedia.

[54]  Dhruv Batra,et al.  LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation , 2016, ICLR.

[55]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yongdong Zhang,et al.  Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[57]  Seungryong Kim,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Martha Larson,et al.  Pairwise geometric matching for large-scale object retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[60]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[61]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[62]  Yi Yang,et al.  Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).