Object Detection Combining CNN and Adaptive Color Prior Features

When compared with the traditional manual design method, the convolutional neural network has the advantages of strong expressive ability and it is insensitive to scale, light, and deformation, so it has become the mainstream method in the object detection field. In order to further improve the accuracy of existing object detection methods based on convolutional neural networks, this paper draws on the characteristics of the attention mechanism to model color priors. Firstly, it proposes a cognitive-driven color prior model to obtain the color prior features for the known types of target samples and the overall scene, respectively. Subsequently, the acquired color prior features and test image color features are adaptively weighted and competed to obtain prior-based saliency images. Finally, the obtained saliency images are treated as features maps and they are further fused with those extracted by the convolutional neural network to complete the subsequent object detection task. The proposed algorithm does not need training parameters, has strong generalization ability, and it is directly fused with convolutional neural network features at the feature extraction stage, thus has strong versatility. Experiments on the VOC2007 and VOC2012 benchmark data sets show that the utilization of cognitive-drive color priors can further improve the performance of existing object detection algorithms.

[1]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yuxing Peng,et al.  ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Jieping Ye,et al.  Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[7]  Larry S. Davis,et al.  Submodular Salient Region Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[9]  Jun Li,et al.  Learn to focus on objects for visual detection , 2019, Neurocomputing.

[10]  Holger Voos,et al.  A Survey of Computer Vision Methods for 2D Object Detection from Unmanned Aerial Vehicles , 2020, J. Imaging.

[11]  Zhi Tang,et al.  CBNet: A Novel Composite Backbone Network Architecture for Object Detection , 2019, AAAI.

[12]  Karanbir Singh Chahal,et al.  A Survey of Modern Object Detection Literature using Deep Learning , 2018, ArXiv.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Benjamin L. Campbell,et al.  Visual attention, buying impulsiveness, and consumer behavior , 2018 .

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xiongwei Wu,et al.  Recent Advances in Deep Learning for Object Detection , 2019, Neurocomputing.

[18]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[21]  Quoc V. Le,et al.  EfficientDet: Scalable and Efficient Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[23]  C. V. Jawahar,et al.  A Multi-Space Approach to Zero-Shot Object Detection , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Huchuan Lu,et al.  Visual saliency detection based on Bayesian model , 2011, 2011 18th IEEE International Conference on Image Processing.

[29]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Bo Gong,et al.  Real-Time Detection for Wheat Head Applying Deep Neural Network , 2020, Sensors.

[31]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[32]  Shuxiao Li,et al.  A fast top-down visual attention method to accelerate template matching , 2014 .

[33]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[34]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Sinan Kalkan,et al.  Generating Positive Bounding Boxes for Balanced Training of Object Detectors , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  M. Usher,et al.  Visual attention modulates the integration of goal-relevant evidence and not value , 2020, bioRxiv.

[40]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[41]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[42]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Fuxiang Liu,et al.  Lightweight Feature Enhancement Network for Single-Shot Object Detection , 2021, Sensors.

[44]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[45]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.