论文信息 - COMO: Widening Deep Neural Networks with COnvolutional MaxOut

COMO: Widening Deep Neural Networks with COnvolutional MaxOut

In this paper, we extend the classic MaxOut strategy, originally designed for Multiple Layer Preceptors (MLPs), into COnvolutional MaxOut (COMO) — a new strategy making deep convolutional neural networks wider with parameter efficiency. Compared to the existing solutions, such as ResNeXt for ResNet or Inception for VGG-alikes, COMO works well on both linear architectures and the ones with skipped connections and residual blocks. More specifically, COMO adopts a novel split-transformmerge paradigm that extends the layers with spatial resolution reduction into multiple parallel splits. For the layer with COMO, each split passes the input feature maps through a 4D convolution operator with independent batch normalization operators for transformation, then merge into the aggregated output of the original sizes through max-pooling. Such a strategy is expected to tackle the potential classification accuracy degradation due to the spatial resolution reduction, by incorporating the multiple splits and max-pooling-based feature selection. Our experiment using a wide range of deep architectures shows that COMO can significantly improve the classification accuracy of ResNet/VGGalike networks based on a large number of benchmark datasets. COMO further outperforms the existing solutions, e.g., Inceptions, ResNeXts, SE-ResNet, and Xception, that make networks wider, and it dominates in the comparison of accuracy versus parameter sizes.

[1] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[2] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[4] Fei Su,et al. Improving deep neural networks with multilayer maxout networks , 2014, 2014 IEEE Visual Communications and Image Processing Conference.

[5] Antonio Torralba,et al. Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[7] Fei-Fei Li,et al. Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[8] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[9] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[11] Wei Li,et al. WebVision Challenge: Visual Learning and Understanding With Web Data , 2017, ArXiv.

[12] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Brian Kan-Wing Mak,et al. End-To-End Low-Resource Lip-Reading with Maxout Cnn and Lstm , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Xuhong Li,et al. Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[17] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[18] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Frédéric Jurie,et al. CentralNet: a Multilayer Approach for Multimodal Fusion , 2018, ECCV Workshops.

[20] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[21] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.

[22] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[23] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[29] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.