PydMobileNet: Pyramid Depthwise Separable Convolution Networks for Image Classification

Convolutional neural networks (CNNs) have shown remarkable performance in various computer vision tasks in recent years. However, the increasing model size has raised challenges in adopting them in real-time applications as well as mobile and embedded vision applications. Many works try to build networks as small as possible while still have acceptable performance. The state-of-the-art architecture is MobileNets. They use Depthwise Separable Convolution (DWConvolution) in place of standard Convolution to reduce the size of networks. This paper describes an improved version of MobileNet, called Pyramid Mobile Network. Instead of using just a 3 × 3 kernel size for DWConvolution like in MobileNet, the proposed network uses a pyramid kernel size to capture more spatial information. The proposed architecture is evaluated on two highly competitive object recognition benchmark datasets (CIFAR-10, CIFAR-100). The experiments demonstrate that the proposed network achieves better performance compared with MobileNet as well as other state-of-the-art networks. Additionally, it is more flexible in fine-tuning the trade-off between accuracy, latency and model size than MobileNets.

[1]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[6]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[8]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[9]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[10]  Kang-Hyun Jo,et al.  3-D Human Pose Estimation Using Cascade of Multiple Neural Networks , 2019, IEEE Transactions on Industrial Informatics.

[11]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[14]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[15]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[16]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[17]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  H. Robbins A Stochastic Approximation Method , 1951 .

[20]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[21]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[23]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[29]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.