A Lite Asymmetric DenseNet for effective object detection based on convolutional neural networks (CNN)

Recently, convolutional neural networks (CNN) have been widely used in object detection and image recognition for their effectiveness. Many highly accurate classification models based on CNN have been developed for various machine learning applications, but they generally computationally costly and require a hardware-based platform with super computing power and memory resources to implement the algorithm. In order to accurately and efficiently achieve object detection tasks using CNN on a system with limited resources such as a mobile device, we propose an innovative type of DenseNet, which is a lightweight convolutional neural network algorithm called Lite Asymmetric DenseNet (LADenseNet). Aiming to compress the CNN model complexity, we replace the 7 x 7 convolution and 3 x 3 max-pool with multiple 3 x 3 convolutions and a 2 x 2 max-pool in the initial down-sampling process to significantly reduce the computing cost. In the design of the dense blocks, channel splitting and channel shuffling are employed to enhance the information exchange of feature maps and improve the expressive ability of the network. We decompose the 3 x 3 convolution in the dense block into a combination of 3 x 1 and 1 x 3 convolutions, which can speed up the computations and extract more spatial features by using asymmetric convolutions. To evaluate the performance of the proposed approach we develop an experimental system in which LA-DenseNet is used to extract features and Single Shot MultiBox Detector (SSD) is used to detect objects. With VOC2007+12 as training and testing datasets, our model achieves comparable detection accuracy as YOLOv2 with a fraction of its computational cost and memory usage.

[1]  Charles X. Ling,et al.  Pelee: A Real-Time Object Detection System on Mobile Devices , 2018, NeurIPS.

[2]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Bingbing Ni,et al.  Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Shiguang Shan,et al.  Fully Learnable Group Convolution for Acceleration of Deep Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shuicheng Yan,et al.  Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[14]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[16]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[18]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).