论文信息 - Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

In this article, we propose a novel object detection algorithm named ”Deep Regionlets” by integrating deep neural networks and a conventional detection schema for accurate generic object detection. Motivated by the effectiveness of regionlets for modeling object deformations and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select sub-regions from which features can be learned from. An object proposal typically contains three – 16 sub-regions. The regionlet learning module focuses on local feature selection and transformations to alleviate the effects of appearance variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a “gating network” within the regionlet leaning module to enable instance dependent soft feature selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We present ablation studies and extensive experiments on the PASCAL VOC dataset and the Microsoft COCO dataset. The proposed method yields competitive performance over state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.

R. Chellappa | Xiaoyu Wang | Zhou Ren | Xutao Lv | Hongyu Xu

[1] Heesung Kwon,et al. Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Larry S. Davis,et al. An Analysis of Pre-Training on Object Detection , 2019, ArXiv.

[3] Jinjun Xiong,et al. Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection , 2018, ArXiv.

[4] Carlos D. Castillo,et al. A Fast and Accurate System for Face Detection, Identification, and Verification , 2018, IEEE Transactions on Biometrics, Behavior, and Identity Science.

[5] Zhe Chen,et al. Context Refinement for Object Detection , 2018, ECCV.

[6] Mun-Cheon Kang,et al. Parallel Feature Pyramid Network for Object Detection , 2018, ECCV.

[7] Xiangyu Zhang,et al. DetNet: Design Backbone for Object Detection , 2018, ECCV.

[8] Fuchun Sun,et al. Deep Feature Pyramid Reconfiguration for Object Detection , 2018, ECCV.

[9] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[10] Yuning Jiang,et al. Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[11] Bingbing Ni,et al. Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Junjie Yan,et al. Quantization Mimic: Towards Very Tiny CNN for Object Detection , 2018, ECCV.

[13] Larry S. Davis,et al. SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[14] Carlos D. Castillo,et al. Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition , 2018, ArXiv.

[15] Hao Wang,et al. Multi-scale Location-Aware Kernel Representation for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Yichen Wei,et al. Learning Region Features for Object Detection , 2018, ECCV.

[17] Jinjun Xiong,et al. Revisiting RCNN: On Awakening the Classification Power of Faster RCNN , 2018, ECCV.

[18] Yichen Wei,et al. Pseudo Mask Augmented Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Rama Chellappa,et al. Deep Regionlets for Object Detection , 2017, ECCV.

[20] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Bo Wang,et al. Single-Shot Object Detection with Enriched Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Larry S. Davis,et al. An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Yunhong Wang,et al. Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[25] Shifeng Zhang,et al. Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Hanqing Lu,et al. CoupleNet: Coupling Global Structure with Local Parts for Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Matthieu Cord,et al. Deformable Part-based Fully Convolutional Network for Object Detection , 2017, BMVC.

[29] Xiaogang Wang,et al. DeepID-Net: Object Detection with Deformable Part Based Convolutional Neural Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Larry S. Davis,et al. Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[32] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33] Carlos D. Castillo,et al. Deep Heterogeneous Feature Fusion for Template-Based Face Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34] Bo Huang,et al. Toward End-to-End Face Recognition Through Alignment Learning , 2017, IEEE Signal Processing Letters.

[35] Wei Liu,et al. DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[36] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Serge J. Belongie,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[41] Abhinav Gupta,et al. Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Rama Chellappa,et al. Learning a structured dictionary for video-based face recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[43] Kavita Bala,et al. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[46] Nikos Komodakis,et al. LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Max Jaderberg,et al. Spatial Transformer Networks , 2015, NIPS.

[49] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Spyros Gidaris,et al. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[52] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54] Bingbing Ni,et al. HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[56] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[58] Miao Sun,et al. Generic Object Detection with Dense Neural Patterns and Regionlets , 2014, BMVC.

[59] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[60] Ming Yang,et al. Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[61] Cordelia Schmid,et al. Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[62] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[64] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65] David A. McAllester,et al. Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[66] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[67] Christopher K. I. Williams,et al. International Journal of Computer Vision manuscript No. (will be inserted by the editor) The PASCAL Visual Object Classes (VOC) Challenge , 2022 .

[68] Xiaoyu Wang,et al. An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[69] Li Fei-Fei,et al. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[70] Rama Chellappa,et al. Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[71] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[72] B. Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[73] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.

[74] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[75] Michael J. Jones,et al. Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[76] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[77] B. K. Julsing,et al. Face Recognition with Local Binary Patterns , 2012 .

[78] Stéphane Mallat,et al. A Wavelet Tour of Signal Processing, 2nd Edition , 1999 .

[79] S. Mallat. A wavelet tour of signal processing , 1998 .

[80] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.