Bottom-Up Segmentation for Top-Down Detection

In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model ``blends'' between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM in $19$ out of $20$ classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC 2010 test by 4%.

[1]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[4]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[10]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[11]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Joost van de Weijer,et al.  Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Jianbo Shi,et al.  Many-to-one contour matching for describing and discriminating object shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Long Zhu,et al.  Active Mask Hierarchies for Object Detection , 2010, ECCV.

[15]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[16]  Cristian Sminchisescu,et al.  Object Recognition by Sequential Figure-Ground Ranking , 2012, International Journal of Computer Vision.

[17]  Pietro Perona,et al.  Object detection and segmentation from joint embedding of parts and pixels , 2011, 2011 International Conference on Computer Vision.

[18]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[19]  Jordi Gonzàlez,et al.  A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[20]  David A. McAllester,et al.  Object Detection with Grammar Models , 2011, NIPS.

[21]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[22]  Subhransu Maji,et al.  Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[23]  Tianli Yu,et al.  Kernelized structural SVM learning for supervised object segmentation , 2011, CVPR 2011.

[24]  C. V. Jawahar,et al.  The truth about cats and dogs , 2011, 2011 International Conference on Computer Vision.

[25]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[27]  Yi Yang,et al.  Layered Object Models for Image Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Derek Hoiem,et al.  Learning to localize detected objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ivan Laptev,et al.  Object Detection Using Strongly-Supervised Deformable Part Models , 2012, ECCV.

[31]  Roozbeh Mottaghi,et al.  Augmenting deformable part models with irregular-shaped object patches , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Björn Ommer,et al.  Beyond Bounding-Boxes: Learning Object Shape by Model-Driven Grouping , 2012, ECCV.

[33]  Qiang Chen,et al.  Hierarchical matching with side information for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Jitendra Malik,et al.  Multi-component Models for Object Detection , 2012, ECCV.