Object Localization Based on Proposal Fusion

Traditional regression framework of object locali-zation such as Overfeat often suffers from the problem of inaccurate scoring due to the separate scoring of classification network and regression network upon inconsistent regions. To tackle this problem, in this paper, we propose a novel object localization framework based on multiple complementary region proposal methods from the view of classification rather than regression. On top of our framework, we first combine multiple complementary region proposals during both training and testing as a means of data augmentation to generate more dense and reliable proposals for fusion, then achieve optimal compromise between complexity and efficiency through category clustering for bounding box sharing among similar categories, and finally propose a dense proposal fusion approach to merge dense region proposals near true object for fine-tuning of the final bounding box's coordinates and updating the confidence of fused proposals for final decision. Extensive experiments on the well-known large scale ILSVRC 2015 LOC dataset verify the effectiveness of our object localization framework.

[1]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[2]  Sheng Tang,et al.  Sparse Ensemble Learning for Concept Detection , 2012, IEEE Transactions on Multimedia.

[3]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[6]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  In-So Kweon,et al.  AttentionNet: Aggregating Weak Directions for Accurate Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sheng Tang,et al.  Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions , 2017, IJCAI.

[16]  M. Shyu,et al.  Florida International University and University of Miami TRECVID 2008 - High Level Feature Extraction , 2008, TRECVID.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[19]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[20]  Min-Chun Hu,et al.  Learning and Recognition of On-Premise Signs From Weakly Labeled Street View Images , 2014, IEEE Transactions on Image Processing.

[21]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.

[22]  Wen-Huang Cheng,et al.  A comparative study of data fusion for RGB-D based visual recognition , 2016, Pattern Recognit. Lett..

[23]  James M. Rehg,et al.  On the Design of Cascades of Boosted Ensembles for Face Detection , 2008, International Journal of Computer Vision.

[24]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Ming-Syan Chen,et al.  MOSRO: Enabling Mobile Sensing for Real-Scene Objects with Grid Based Structured Output Learning , 2014, MMM.

[29]  Jian Sun,et al.  Object Detection Networks on Convolutional Feature Maps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[31]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[32]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[33]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Jonathan Brandt,et al.  Robust object detection via soft cascade , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Cordelia Schmid,et al.  Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Sheng Tang,et al.  Image Caption with Global-Local Attention , 2017, AAAI.

[41]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[42]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[43]  Soumith Chintala,et al.  A MultiPath Network for Object Detection , 2016, BMVC.

[44]  Meng Wang,et al.  Visual Classification by ℓ1-Hypergraph Modeling , 2015, IEEE Trans. Knowl. Data Eng..

[45]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[51]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[52]  Sheng Tang,et al.  An efficient concept detection system via sparse ensemble learning , 2015, Neurocomputing.

[53]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[54]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.