论文信息 - Infomax principle based pooling of deep convolutional activations for image retrieval

Infomax principle based pooling of deep convolutional activations for image retrieval

Neural activations produced by deep convolutional networks have recently become state-of-the-art representation for image retrieval. To obtain a global image representation, sum-pooling has been frequently used to aggregate activations of convolutional feature maps. This work first presents an understanding on the effectiveness of sum-pooling via probabilistic interpretation, by proving that sum-pooling is an upper bound of the probability that a visual pattern is present in an image. To further answer the optimality of sum-pooling, a quantitative analysis based on the Infomax principle in neural networks is provided. It shows that sum-pooling aligns well with the leading eigenvector of principal component analysis (PCA) applied to the activations of a feature map. Moreover, considering the 2D matrix structure of feature maps, a two-directional 2DPCA-based pooling scheme is proposed to aggregate the convolutional activations. Experiments on multiple benchmark image retrieval datasets demonstrate the above analysis and the superiority of the proposed pooling scheme.

[1] Victor S. Lempitsky,et al. Neural Codes for Image Retrieval , 2014, ECCV.

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Daoqiang Zhang,et al. (2D)2PCA: Two-directional two-dimensional PCA for efficient face representation and recognition , 2005, Neurocomputing.

[4] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5] Naila Murray,et al. Generalized Max Pooling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[7] Simon Osindero,et al. Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Atsuto Maki,et al. A Baseline for Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR 2015.

[10] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[13] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[14] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[16] Lei Wang,et al. In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[17] Larry S. Davis,et al. Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18] Victor S. Lempitsky,et al. Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.