Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

In this paper, we propose a novel object detection algorithm named "Deep Regionlets" by integrating deep neural networks and conventional detection schema for accurate generic object detection. Motivated by the advantages of regionlets on modeling object deformation and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select sub-regions from which features can be learned from. An object proposal typically contains 3-16 sub-regions. The regionlet learning module focuses on local feature selection and transformation to alleviate the effects of appearance variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a "gating network" within the regionlet leaning module to enable instance dependent soft feature selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We present ablation studies and extensive experiments on the PASCAL VOC dataset and the Microsoft COCO dataset. The proposed method outperforms state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[3]  Bingbing Ni,et al.  Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiangyu Zhang,et al.  DetNet: Design Backbone for Object Detection , 2018, ECCV.

[7]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Rama Chellappa,et al.  Deep Regionlets for Object Detection , 2017, ECCV.

[9]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yichen Wei,et al.  Pseudo Mask Augmented Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Matthieu Cord,et al.  Deformable Part-based Fully Convolutional Network for Object Detection , 2017, BMVC.

[13]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[14]  Zhe Chen,et al.  Context Refinement for Object Detection , 2018, ECCV.

[15]  Hanqing Lu,et al.  CoupleNet: Coupling Global Structure with Local Parts for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Bo Wang,et al.  Single-Shot Object Detection with Enriched Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yichen Wei,et al.  Learning Region Features for Object Detection , 2018, ECCV.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  S. Mallat A wavelet tour of signal processing , 1998 .

[31]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[33]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34]  Heesung Kwon,et al.  Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Jinjun Xiong,et al.  Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection , 2018, ArXiv.

[38]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Carlos D. Castillo,et al.  A Fast and Accurate System for Face Detection, Identification, and Verification , 2018, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[40]  Jinjun Xiong,et al.  Revisiting RCNN: On Awakening the Classification Power of Faster RCNN , 2018, ECCV.

[41]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[42]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[43]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[44]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[45]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[46]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[47]  Carlos D. Castillo,et al.  Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[51]  Nikos Komodakis,et al.  LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[53]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Carlos D. Castillo,et al.  Deep Heterogeneous Feature Fusion for Template-Based Face Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[55]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Rama Chellappa,et al.  Learning a structured dictionary for video-based face recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[60]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Miao Sun,et al.  Generic Object Detection with Dense Neural Patterns and Regionlets , 2014, BMVC.

[62]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[63]  Fuchun Sun,et al.  Deep Feature Pyramid Reconfiguration for Object Detection , 2018, ECCV.

[64]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[65]  Hao Wang,et al.  Multi-scale Location-Aware Kernel Representation for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[68]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[70]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing, 2nd Edition , 1999 .

[71]  Bo Huang,et al.  Toward End-to-End Face Recognition Through Alignment Learning , 2017, IEEE Signal Processing Letters.

[72]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[73]  Larry S. Davis,et al.  An Analysis of Pre-Training on Object Detection , 2019, ArXiv.

[74]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[75]  Junjie Yan,et al.  Quantization Mimic: Towards Very Tiny CNN for Object Detection , 2018, ECCV.

[76]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Mun-Cheon Kang,et al.  Parallel Feature Pyramid Network for Object Detection , 2018, ECCV.

[78]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Xiaogang Wang,et al.  DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).