Unsupervised deep quantization for object instance search

Abstract In this paper, we propose an unsupervised deep quantization (UDQ) method for object instance search. The UDQ utilizes product quantization to discover the underlying self-supervision information of the training data and iteratively exploits the self-supervision information to optimize features of the training data in an unsupervised fashion. The optimized features are further used to update the self-supervision information for the subsequent training procedure. We introduce two constraints, the separability constraint and the discriminability constraint, to encourage the features to satisfy a cluster structure which is essential for the effective supervision information generation with the product quantization. The UDQ is optimized with an iterative optimization strategy which guarantees that the features and the supervision information can be enhanced each other alternately in a unified model. Moreover, we develop three refinement strategies to refine features to obtain better supervision information for the model optimization. Experimental results on four datasets show the superiority of our UDQ over the state-of-the-art methods.

[1]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[2]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[4]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[5]  Junsong Yuan,et al.  Efficient Object Instance Search Using Fuzzy Objects Matching , 2017, AAAI.

[6]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[7]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[8]  Jian Sun,et al.  Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yongdong Zhang,et al.  Region similarity arrangement for large-scale image retrieval , 2018, Neurocomputing.

[10]  Jiri Matas,et al.  Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ngai-Man Cheung,et al.  Embedding Based on Function Approximation for Large Scale Image Search , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yongdong Zhang,et al.  A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[14]  Yu Liu,et al.  DeepIndex for Accurate and Efficient Image Retrieval , 2015, ICMR.

[15]  Jianmin Wang,et al.  Deep Quantization Network for Efficient Image Retrieval , 2016, AAAI.

[16]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yuan Zhang,et al.  SIFT Matching with CNN Evidences for Particular Object Retrieval , 2017, Neurocomputing.

[19]  Shin'ichi Satoh,et al.  Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[23]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[24]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Serge J. Belongie,et al.  Learning Single-View 3D Reconstruction with Limited Pose Supervision , 2018, ECCV.

[28]  Honggang Zhang,et al.  Deep Attentive Tracking via Reciprocative Learning , 2018, NeurIPS.

[29]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[30]  Junsong Yuan,et al.  Product Quantization Network for Fast Image Retrieval , 2018, ECCV.

[31]  Jungong Han,et al.  Large-scale image retrieval with Sparse Embedded Hashing , 2017, Neurocomputing.

[32]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[33]  Luc Van Gool,et al.  Query Adaptive Similarity for Large Scale Object Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Xirong Li,et al.  Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.

[35]  Ling Huang,et al.  Optimization of deep convolutional neural network for large scale image retrieval , 2018, Neurocomputing.

[36]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[38]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40]  Yannis Avrithis,et al.  Early burst detection for memory-efficient image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Arnold W. M. Smeulders,et al.  Locality in Generic Instance Search from One Example , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Heng Tao Shen,et al.  Deep Region Hashing for Generic Instance Search from Images , 2018, AAAI.

[44]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[46]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Ming-Hsuan Yang,et al.  Dynamic Match Kernel With Deep Convolutional Features for Image Retrieval , 2018, IEEE Transactions on Image Processing.

[48]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[49]  Shiliang Zhang,et al.  Semantic-Aware Co-indexing for Image Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.