CNN Descriptor Improvement Based on L2-Normalization and Feature Pooling for Patch Classification

The L2-normalization and feature pooling have a wide range of applications in image classification and have also achieved remarkable results. However, there is much room for the existing descriptors which are extracted from pre-trained Convolutional Neural Network (CNN) models, to meet the requirement of precision for patch classification. We generate CNN descriptors by using L2-normalization and feature pooling on the existing pre-trained CNN descriptors. By evaluation on the Brown dataset, the mean Average Precision (mAP) of descriptor, which is based on Inception-v3 model that applies both L2-normalization and feature pooling, reaches 99.27%, 98.97% and 98.02% in three sub-datasets. Compared with the pre-trained CNN descriptors without L2-normalization and feature pooling, the mAP of pre-trained CNN descriptors can be respectively increased by $1.41 \% \sim 17.31 \%, 3.11 \% \sim 17.99 \%$. $1.19\%\sim15.24\%$. According to the experimental results, it is obvious that L2-normalization and feature pooling are beneficial to improve the performance of pre-trained CNN descriptors.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[3]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Hervé Jégou,et al.  Kernel Local Descriptors with Implicit Rotation Matching , 2015, ICMR.

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[10]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Shuicheng Yan,et al.  Task-Driven Feature Pooling for Image Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[17]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[20]  Qi Tian,et al.  Simple Techniques Make Sense: Feature Pooling and Normalization for Image Classification , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[24]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.