Multiple Object Detection by a Deformable Part-Based Model and an R-CNN

Multiple object detection is a key challenge in object detection. Feature extraction and occlusion handling are two key elements in multiple object detection. However, existing methods do not perform well in these aspects during detecting multiple objects. A region-based convolutional network (R-CNN) has achieved a great success in region-based feature extraction, and the part filters in a deformable part-based model (DPM) are very suitable for detecting occluded objects. In this letter, we present a framework that integrates the R-CNN and the DPM for detecting multiple objects. In addition, we propose a new filter based on the dense subgraph discovery algorithm for refining the candidate proposals generated by the DPM. Through combining these two models, we can detect each single object with high accuracy among all objects in an image, especially in the situation that objects are stayed close together. Different from the traditional methods, our framework has the capability to detect multiple objects belonging to various classes rather than only one typical class such as a person or a car. Experimental results on the PASCAL VOC dataset show that our framework achieves better performance compared to using only the R-CNN or the DPM alone in multiple object detection.

[1]  Xiaogang Wang,et al.  End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ram Nevatia,et al.  Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[4]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jing Xiao,et al.  Contextual boost for pedestrian detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Xiaogang Wang,et al.  Partial Occlusion Handling in Pedestrian Detection With a Deep Model , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Venkatesh Saligrama,et al.  Sequential Optimization for Efficient High-Quality Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xiaogang Wang,et al.  Single-Pedestrian Detection Aided by Two-Pedestrian Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[21]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.