论文信息 - Research on Optimization Method of Multi-scale Fish Target Fast Detection Network

Research on Optimization Method of Multi-scale Fish Target Fast Detection Network

The fish target detection algorithm lacks a good quality data set, and the algorithm achieves real-time detection with lower power consumption on embedded devices, and it is difficult to balance the calculation speed and identification ability. To this end, this paper collected and annotated a data set named “Aquarium Fish” of 84 fishes containing 10042 images, and based on this data set, proposed a multi-scale input fast fish target detection network (BTP-yoloV3) and its optimization method. The experiment uses Depthwise convolution to redesign the backbone of the yoloV4 network, which reduces the amount of calculation by 94.1%, and the test accuracy is 92.34%. Then, the training model is enhanced with MixUp, CutMix, and mosaic to increase the test accuracy by 1.27%; Finally, use the mish, swish, and ELU activation functions to increase the test accuracy by 0.76%. As a result, the accuracy of testing the network with 2000 fish images reached 94.37%, and the computational complexity of the network BFLOPS was only 5.47. Comparing the YoloV3~4, MobileNetV2-yoloV3, and YoloV3-tiny networks of migration learning on this data set. The results show that BTP-Yolov3 has smaller model parameters, faster calculation speed, and lower energy consumption during operation while ensuring the calculation accuracy. It provides a certain reference value for the practical application of neural network.

[1] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[4] Diganta Misra. Mish: A Self Regularized Non-Monotonic Activation Function , 2020, BMVC.

[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Hong-Yuan Mark Liao,et al. YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[10] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[11] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[12] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13] Huajun Feng,et al. Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Abien Fred Agarap. Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[16] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[19] Peter M. Roth,et al. The Quest for the Golden Activation Function , 2018, ArXiv.

[20] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Zhaohui Zheng,et al. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[22] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[24] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Jacek Tabor,et al. Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation , 2015, ECML/PKDD.

[28] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.