Multi-negative samples with Generative Adversarial Networks for image retrieval

Abstract The task of image retrieval has received considerable attention from the visual AI community. However, collecting large-scale labeled images for training is rather laborious and expensive. Moreover, negative samples are treated equally without considering their differences when compared with the query. To overcome those problems, in this paper, we propose to utilize generated virtual images and multiple negative samples to simultaneously learn image representations for the task of image retrieval. In our method, we first utilize the Generative Adversarial Networks in a semi-supervised fashion to produce virtual images with an adversarial loss. Second, considering the neighborhood structure within negative samples, a random sampling algorithm is proposed to effectively mining the potentially hard samples. Third, we propose a multi-negative loss function with the Kullback–Leibler divergence. Finally, by optimizing the total loss the deep neural networks are trained. Then the learned networks are further used to obtain image representations. Extensive experiments are conducted on publicly available datasets. Our model demonstrates better performances in the task of image retrieval.

[1]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[2]  Ruifan Li,et al.  Deep correspondence restricted Boltzmann machine for cross-modal retrieval , 2015, Neurocomputing.

[3]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[4]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[7]  Kamelia Aryafar,et al.  Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank , 2015, KDD.

[8]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Muhammad Ikram,et al.  Image Retrieval in Multimedia Databases: A Survey , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[11]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Nicu Sebe,et al.  Quantization-based hashing: a general framework for scalable image and video retrieval , 2018, Pattern Recognit..

[13]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[14]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[15]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[22]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Jingkuan Song,et al.  Binary Generative Adversarial Networks for Image Retrieval , 2017, AAAI.

[24]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[25]  Tao Mei,et al.  Deep Semantic Hashing with Generative Adversarial Networks , 2017, SIGIR.

[26]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[27]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[28]  Chen Huang,et al.  Local Similarity-Aware Deep Feature Embedding , 2016, NIPS.

[29]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Meng Wang,et al.  Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder , 2018, IEEE Transactions on Image Processing.

[32]  Nicu Sebe,et al.  The Many Shades of Negativity , 2017, IEEE Transactions on Multimedia.

[33]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[34]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[35]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[36]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Xinbo Gao,et al.  Discovering Latent Aspects for Diversity-Induced Image Retrieval , 2018, IEEE MultiMedia.

[41]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[44]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[46]  Xiaojie Wang,et al.  Retrieving real world clothing images via multi-weight deep convolutional neural networks , 2019, Cluster Computing.

[47]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[49]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[50]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).