Real Time Pear Fruit Detection and Counting Using YOLOv4 Models and Deep SORT

This study aimed to produce a robust real-time pear fruit counter for mobile applications using only RGB data, the variants of the state-of-the-art object detection model YOLOv4, and the multiple object-tracking algorithm Deep SORT. This study also provided a systematic and pragmatic methodology for choosing the most suitable model for a desired application in agricultural sciences. In terms of accuracy, YOLOv4-CSP was observed as the optimal model, with an AP@0.50 of 98%. In terms of speed and computational cost, YOLOv4-tiny was found to be the ideal model, with a speed of more than 50 FPS and FLOPS of 6.8–14.5. If considering the balance in terms of accuracy, speed and computational cost, YOLOv4 was found to be most suitable and had the highest accuracy metrics while satisfying a real time speed of greater than or equal to 24 FPS. Between the two methods of counting with Deep SORT, the unique ID method was found to be more reliable, with an F1count of 87.85%. This was because YOLOv4 had a very low false negative in detecting pear fruits. The ROI line is more reliable because of its more restrictive nature, but due to flickering in detection it was not able to count some pears despite their being detected.

[1]  Stefan Roth,et al.  MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking , 2015, ArXiv.

[2]  Fumiki Hosoi,et al.  Automatic pear and apple detection by videos using deep learning and a Kalman filter , 2021 .

[3]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[5]  Rainer Stiefelhagen,et al.  Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics , 2008, EURASIP J. Image Video Process..

[6]  Mubashiru Olarewaju Lawal Tomato detection based on modified YOLOv3 framework , 2021, Scientific reports.

[7]  Jun-Wei Hsieh,et al.  CSPNet: A New Backbone that can Enhance Learning Capability of CNN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[9]  Xiangjun Zou,et al.  Fast and Accurate Detection of Banana Fruits in Complex Background Orchards , 2020, IEEE Access.

[10]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[11]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Vladimir Soloviev,et al.  Using YOLOv3 Algorithm with Pre- and Post-Processing for Apple Detection in Fruit-Harvesting Robot , 2020, Agronomy.

[13]  K. Walsh,et al.  Deep learning for real-time fruit detection and orchard fruit load estimation: benchmarking of ‘MangoYOLO’ , 2019, Precision Agriculture.

[14]  Jongyoul Park,et al.  An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jianlin Wang,et al.  DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection , 2019, Inf. Sci..

[17]  A. Kamilaris,et al.  A review of the use of convolutional neural networks in agriculture , 2018, The Journal of Agricultural Science.

[18]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[19]  Ali Kashif Bashir,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2013, ICIRA 2013.

[20]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[21]  Guoxu Liu,et al.  YOLO-Tomato: A Robust Algorithm for Tomato Detection Based on YOLOv3 , 2020, Sensors.

[22]  Rodrigo Ventura,et al.  Robust Object Recognition Through Symbiotic Deep Learning In Mobile Robots , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Chien-Yao Wang,et al.  Scaled-YOLOv4: Scaling Cross Stage Partial Network , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Rongli Gai,et al.  A detection algorithm for cherry fruits based on the improved YOLO-v4 model , 2021, Neural Computing and Applications.

[26]  Diganta Misra,et al.  Mish: A Self Regularized Non-Monotonic Neural Activation Function , 2019, ArXiv.

[27]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Chao Chen,et al.  Fast implementation of real-time fruit detection in apple orchards using deep learning , 2020, Comput. Electron. Agric..

[29]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[30]  Quoc V. Le,et al.  DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[31]  Fuzeng Yang,et al.  A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5 , 2021, Remote. Sens..

[32]  A. Barduhn,et al.  Low-temperature solubility of caprolactam in water , 1982 .

[33]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[34]  Andreas Kamilaris,et al.  Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..

[35]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Wei Xie,et al.  Lemon-YOLO: An efficient object detection method for lemons in the natural environment , 2021, IET Image Process..

[38]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[40]  Juan Du,et al.  Understanding of Object Detection Based on CNN Family and YOLO , 2018 .

[41]  Anna Kuznetsova,et al.  Detecting Apples in Orchards Using YOLOv3 and YOLOv5 in General and Close-Up Images , 2020, ISNN.

[42]  Lin Wu,et al.  Apple Detection in Complex Scene Using the Improved YOLOv4 Model , 2021, Agronomy.

[43]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[44]  V. Soloviev,et al.  YOLOv5 versus YOLOv3 for Apple Detection , 2021 .

[45]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[46]  Andre Araujo,et al.  Computing Receptive Fields of Convolutional Neural Networks , 2019, Distill.

[47]  Nasser Kehtarnavaz,et al.  Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time Apps , 2019, Mach. Learn. Knowl. Extr..