Learning Geometric Invariance Features and Discrimination Representation for Image Classification via Spatial Transform Network and XGBoost Modeling

Convolutional neural network (CNN) has proven itself as a promising methodology for various computer vision tasks due to its efficient hierarchical feature learning of input data. However, the pre-trained CNN model always has a limited ability to be spatially invariant to the image as the convolutional layers are not invariant to general affine transformations, such as rotation and scale. This scenario will extremely affect the generalization ability of the trained CNNs. In this work, we address this problem by leveraging recent advances in spatial transform network (STN) and XGBoost. Specifically, we propose a framework which consists of an embedded STN and XGBoost for learning the geometric invariance features and discrimination representation of the image data. We firstly establish a CNN embedding a STN to effectively extract the geometric invariance features of input image; then instead of employing the conventional softmax unit as the classifier, we adopt the high-efficient and faster XGBoost as the discrimination representation of the learned features. We conduct a series of experiments based on benchmark dataset Fashion MNIST to verify the effectiveness of our framework. The results demonstrate that our method can not only learn the geometric invariance features of input images, but also have a superior performance for the discriminate representation of the learned features, compared with recent several representative methods.