论文信息 - Learning Geometric Invariance Features and Discrimination Representation for Image Classification via Spatial Transform Network and XGBoost Modeling

Learning Geometric Invariance Features and Discrimination Representation for Image Classification via Spatial Transform Network and XGBoost Modeling

Convolutional neural network (CNN) has proven itself as a promising methodology for various computer vision tasks due to its efficient hierarchical feature learning of input data. However, the pre-trained CNN model always has a limited ability to be spatially invariant to the image as the convolutional layers are not invariant to general affine transformations, such as rotation and scale. This scenario will extremely affect the generalization ability of the trained CNNs. In this work, we address this problem by leveraging recent advances in spatial transform network (STN) and XGBoost. Specifically, we propose a framework which consists of an embedded STN and XGBoost for learning the geometric invariance features and discrimination representation of the image data. We firstly establish a CNN embedding a STN to effectively extract the geometric invariance features of input image; then instead of employing the conventional softmax unit as the classifier, we adopt the high-efficient and faster XGBoost as the discrimination representation of the learned features. We conduct a series of experiments based on benchmark dataset Fashion MNIST to verify the effectiveness of our framework. The results demonstrate that our method can not only learn the geometric invariance features of input images, but also have a superior performance for the discriminate representation of the learned features, compared with recent several representative methods.

Wang Yin | Xiaopeng Guo | Liye Mei

[1] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[4] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[6] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10] Abien Fred Agarap. An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification , 2017, ArXiv.

[11] E. Miles Stoudenmire,et al. Learning relevant features of data with multi-scale tensor networks , 2017, ArXiv.

[12] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.