Simple Fusion of Object Detectors for Improved Performance and Faster Deployment

Object detectors often suffer from multiple performance limitations which may be attenuated with larger training datasets, improved training techniques, and complex detection models. However, such strategies are complex and time-consuming for applications requiring fast deployments. We propose a Simple Fusion of Object Detectors (SFOD) late ensemble method to combine existing pre-trained, off-the-shelf, fine-tuned object detectors and leverage on their divergences to improve the overall detection performance. Comprehensive experimental evaluations, based on PASCAL VOC07 challenge, demonstrate SFOD’s ability to improve mean average precision (<inline-formula> <tex-math notation="LaTeX">${mAP}$ </tex-math></inline-formula>) for different fusion sizes and base detector combinations, reaching an absolute 84.08% <inline-formula> <tex-math notation="LaTeX">${mAP}$ </tex-math></inline-formula> and an improvement of 3.97% <inline-formula> <tex-math notation="LaTeX">${mAP}$ </tex-math></inline-formula>. The improvements extend to most classes, fusion sizes, and base detector combinations, revealing <inline-formula> <tex-math notation="LaTeX">$AP$ </tex-math></inline-formula> improvements up to 17.35% over baselines, for particular object classes. Practical application evaluations, based on optimal threshold selection, also reveal improvements of 10.54% and 8.36% of mean recall (<inline-formula> <tex-math notation="LaTeX">$mR$ </tex-math></inline-formula>) and <inline-formula> <tex-math notation="LaTeX">${mAP}$ </tex-math></inline-formula>, respectively. Our approach does not require additional training and is quickly deployable, yet providing a few adjustable hyperparameters to optimize the recall-precision relation for specific applications. Improvements obtained from our proposed SFOD fusion pipeline span across a broad range of object classes and are important for a wide variety of critical applications where every successful detection is treasured.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  David B. Skalak,et al.  The Sources of Increased Accuracy for Two Proposed Boosting Algorithms , 1996, AAAI 1996.

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Narciso García,et al.  An efficient multiple object detection and tracking framework for automatic counting and video surveillance applications , 2012, IEEE Transactions on Consumer Electronics.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Klaus C. J. Dietmayer,et al.  Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges , 2019, IEEE Transactions on Intelligent Transportation Systems.

[8]  Trevor Darrell,et al.  Deep Mixture of Experts via Shallow Embedding , 2018, UAI.

[9]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Chan-Tong Lam,et al.  Robust Pedestrian Detection: Faster Deployments with Fusion of Models , 2019, ACPR.

[14]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[15]  G. Yule On the Methods of Measuring Association between Two Attributes , 1912 .

[16]  Zhongmin Liu,et al.  An Efficient Pedestrian Detection Method Based on YOLOv2 , 2018 .

[17]  Francisco Herrera,et al.  Object Detection Binary Classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance , 2020, Knowl. Based Syst..

[18]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Roman Solovyev,et al.  Weighted boxes fusion: Ensembling boxes from different object detection models , 2021, Image Vis. Comput..

[21]  Xiaogang Wang,et al.  Single-Pedestrian Detection Aided by Multi-pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Geoffrey E. Hinton,et al.  An Alternative Model for Mixtures of Experts , 1994, NIPS.

[23]  T. Aaron Gulliver,et al.  A Faster RCNN-Based Pedestrian Detection System , 2016, 2016 IEEE 84th Vehicular Technology Conference (VTC-Fall).

[24]  Heesung Kwon,et al.  Dynamic belief fusion for object detection , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[25]  Gautam Srivastava,et al.  Fast and Accurate Convolution Neural Network for Detecting Manufacturing Data , 2021, IEEE Transactions on Industrial Informatics.

[26]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[27]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[28]  Zhigang Zeng,et al.  CLU-CNNs: Object detection for medical images , 2019, Neurocomputing.

[29]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[31]  Hervé Glotin,et al.  Pedestrian Detection Based on Fast R-CNN and Batch Normalization , 2017, ICIC.

[32]  Tom Hope,et al.  All Together Now! The Benefits of Adaptively Fusing Pre-trained Deep Representations , 2019, ICPRAM.

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Antonio Pescapè,et al.  Multi-classification approaches for classifying mobile app traffic , 2018, J. Netw. Comput. Appl..

[35]  Jinhui Tang,et al.  CAD: Scale Invariant Framework for Real-Time Object Detection , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[36]  Ricardo Buettner,et al.  A Highly Effective Deep Learning Based Escape Route Recognition Module for Autonomous Robots in Crisis and Emergency Situations , 2019, HICSS.

[37]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[38]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[40]  Kincho H. Law,et al.  Detection and Segmentation of Manufacturing Defects with Convolutional Neural Networks and Transfer Learning , 2018, Smart and sustainable manufacturing systems.

[41]  Padhraic Smyth,et al.  Linearly Combining Density Estimators via Stacking , 1999, Machine Learning.

[42]  Louisa Lam,et al.  Classifier Combinations: Implementations and Theoretical Issues , 2000, Multiple Classifier Systems.

[43]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[44]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[47]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[48]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[49]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[50]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[51]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  Eirik Gärtner Solberg Deep neural networks for object detection in agricultural robotics , 2017 .

[54]  Yang Liu,et al.  Detect2Rank: Combining Object Detectors Using Learning to Rank , 2014, IEEE Transactions on Image Processing.

[55]  Neeta Nain,et al.  Crowd Monitoring and Classification: A Survey , 2017 .

[56]  Bev Littlewood,et al.  Conceptual Modeling of Coincident Failures in Multiversion Software , 1989, IEEE Trans. Software Eng..

[57]  Chung-Lin Huang,et al.  Human Object Identification for Human-Robot Interaction by Using Fast R-CNN , 2018, 2018 Second IEEE International Conference on Robotic Computing (IRC).

[58]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[59]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Reza Ebrahimpour,et al.  Mixture of feature specified experts , 2014, Inf. Fusion.

[61]  Toby P. Breckon,et al.  An evaluation of region based object detection strategies within X-ray baggage security imagery , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[62]  Jónathan Heras,et al.  Ensemble Methods for Object Detection , 2020, ECAI.

[63]  Jiri Matas,et al.  ALFA: Agglomerative Late Fusion Algorithm for Object Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[64]  Yusuke Niitani,et al.  ChainerCV: a Library for Deep Learning in Computer Vision , 2017, ACM Multimedia.

[65]  Bruce E. Rosen,et al.  Ensemble Learning Using Decorrelated Neural Networks , 1996, Connect. Sci..

[66]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[67]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[68]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[70]  Padraig Cunningham,et al.  Diversity versus Quality in Classification Ensembles Based on Feature Selection , 2000, ECML.

[71]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.