Research on Optimization Method of Multi-scale Fish Target Fast Detection Network

The fish target detection algorithm lacks a good quality data set, and the algorithm achieves real-time detection with lower power consumption on embedded devices, and it is difficult to balance the calculation speed and identification ability. To this end, this paper collected and annotated a data set named “Aquarium Fish” of 84 fishes containing 10042 images, and based on this data set, proposed a multi-scale input fast fish target detection network (BTP-yoloV3) and its optimization method. The experiment uses Depthwise convolution to redesign the backbone of the yoloV4 network, which reduces the amount of calculation by 94.1%, and the test accuracy is 92.34%. Then, the training model is enhanced with MixUp, CutMix, and mosaic to increase the test accuracy by 1.27%; Finally, use the mish, swish, and ELU activation functions to increase the test accuracy by 0.76%. As a result, the accuracy of testing the network with 2000 fish images reached 94.37%, and the computational complexity of the network BFLOPS was only 5.47. Comparing the YoloV3~4, MobileNetV2-yoloV3, and YoloV3-tiny networks of migration learning on this data set. The results show that BTP-Yolov3 has smaller model parameters, faster calculation speed, and lower energy consumption during operation while ensuring the calculation accuracy. It provides a certain reference value for the practical application of neural network.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[4]  Diganta Misra Mish: A Self Regularized Non-Monotonic Activation Function , 2020, BMVC.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[8]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[10]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[19]  Peter M. Roth,et al.  The Quest for the Golden Activation Function , 2018, ArXiv.

[20]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[22]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[24]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jacek Tabor,et al.  Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation , 2015, ECML/PKDD.

[28]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.