Multi-scale orderless cross-regions-pooling of deep attributes for image retrieval

How to represent an image is an essential problem of the image retrieval task. To build a powerful image representation, a novel method named cross-regions-pooling (CRP) combining two key ingredients is proposed: (i) region proposals detected by objectness detection technique; (ii) deep attributes (DA), i.e. the outputs of the softmax layer of off-the-shelf convolutional neural network pre-trained on a large-scale dataset. The ultimate representation of an image is the aggregation (e.g. max-pooling) of DA extracted from all the regions. In addition, a multi-scale orderless pooling strategy considering layout of contexts of an image is proposed to integrate with CRP to improve the image representation. Experimental results on standard benchmarks demonstrate superiority of the proposed method over state-of-the-arts.

[1]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[2]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.