Weight-sharing multi-stage multi-scale ensemble convolutional neural network

Most of the existing convolutional neural networks (CNNs) ignore multi-scale features of input image to different extents. Thus they lack robustness to feature scale of the input image, which limits the generalization ability of the model. In addition, on the premise of large-scale data, in order to obtain higher image classification accuracy, CNNs generally require more layers and a huge amount of parameters, resulting in a higher cost of network training. To this end, a Weight-Sharing Multi-Stage Multi-Scale Ensemble Convolutional Neural Network (WSMSMSE-CNN) is proposed in this paper. The input image is pooled several times to obtain multi-scale images and sent to a multi-stage network. Each stage is a multi-layer multi-scale ensemble network consisting of Conv Block, Pooling layer and Dropout layer. Conv Blocks in the same stage are connected by pooling layers while those in different stage but at the same location share the same weights. In this way, multi-scale features of both the same image and scale features of multi-scale images are obtained. In addition, the large-sized convolutional kernel is replaced by a number of consecutive small-sized ones, which not only keep the receptive field unchanged, but also effectively control the number of parameters. Experimental results on CIFAR-10 and CIFAR-100 datasets verify that WSMSMSE-CNN not only has good robustness, but also requires fewer layers to obtain higher classification accuracy.

[1]  Silvia Corchs,et al.  Ensemble learning on visual and textual data for social image emotion classification , 2017, International Journal of Machine Learning and Cybernetics.

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Bolei Zhou,et al.  Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiangji Huang,et al.  CNN-based image analysis for malaria diagnosis , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[13]  Diogo Almeida,et al.  Resnet in Resnet: Generalizing Residual Architectures , 2016, ArXiv.

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[19]  Joshua Zhexue Huang,et al.  Ensemble subspace clustering of text data using two-level features , 2017, Int. J. Mach. Learn. Cybern..

[20]  Raimondo Schettini,et al.  Logo Recognition Using CNN Features , 2015, ICIAP.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Shesheng Gao,et al.  Image Segmentation-Based Multi-Focus Image Fusion Through Multi-Scale Convolutional Neural Network , 2017, IEEE Access.

[23]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[24]  David A. Forsyth,et al.  Swapout: Learning an ensemble of deep architectures , 2016, NIPS.

[25]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Khalid Ashraf,et al.  Abnormality Detection and Localization in Chest X-Rays using Deep Convolutional Neural Networks , 2017, ArXiv.

[27]  Gürsel Serpen,et al.  Performance of global–local hybrid ensemble versus boosting and bagging ensembles , 2012, International Journal of Machine Learning and Cybernetics.

[28]  Ling Li,et al.  Tracking human poses in various scales with accurate appearance , 2017, Int. J. Mach. Learn. Cybern..

[29]  George R. Thoma,et al.  Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images , 2018, PeerJ.

[30]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[32]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.