Query Based Object Retrieval Using Neural Codes

The task of retrieving a specific object from an image, which is similar to a query object is one of the critical applications in the computer vision domain. The existing methods fail to return similar objects when the region of interest is not specified correctly in a query image. Furthermore, when the feature vector is large, the retrieval from big collections is usually computationally expensive. In this paper, we propose an object retrieval method, which is based on the neural codes (activations) generated by the last inner-product layer of the Faster R-CNN network demonstrating that it can be used not only for object detection but for retrieval too. To evaluate the method, we have used a subset of ImageNet comprising of images related to indoor scenes, and to speed-up the retrieval, we first process all the images from the dataset and we save information (i.e. neural codes, objects present in the image, confidence scores and bounding box coordinates) corresponding to each detected object. Then, given a query image, the system detects the object present and retrieves its neural codes, which are then used to compute the cosine similarity against saved neural codes. We retrieved objects with high cosine similarity scores, and then we compared it with the results obtained using confidence scores. We showed that our approach takes only 0.534 s to retrieve all the 1454 objects in our test set.

[1]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[2]  Hiroyasu Sakamoto,et al.  Convolutional Recurrent Neural Networks for Better Image Understanding , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[3]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[4]  Ping Wang,et al.  Content-based image retrieval based on CNN and SVM , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[5]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[8]  Atsuto Maki,et al.  A Baseline for Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR 2015.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).