Object detection and segmentation from joint embedding of parts and pixels

We present a new framework in which image segmentation, figure/ground organization, and object detection all appear as the result of solving a single grouping problem. This framework serves as a perceptual organization stage that integrates information from low-level image cues with that of high-level part detectors. Pixels and parts each appear as nodes in a graph whose edges encode both affinity and ordering relationships. We derive a generalized eigen-problem from this graph and read off an interpretation of the image from the solution eigenvectors. Combining an off-the-shelf top-down part-based person detector with our low-level cues and grouping formulation, we demonstrate improvements to object detection and segmentation.

[1]  Jianbo Shi,et al.  Many-to-one contour matching for describing and discriminating object shape , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[3]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Jianbo Shi,et al.  Object-Specific Figure-Ground Segregation , 2003, CVPR.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Alexei A. Efros,et al.  Improving Spatial Support for Objects via Multiple Segmentations , 2007, BMVC.

[9]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[12]  Jianbo Shi,et al.  Spectral segmentation with multiscale graph decomposition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jitendra Malik,et al.  Using contours to detect and localize junctions in natural images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Charless C. Fowlkes,et al.  Multiresolution Models for Object Detection , 2010, ECCV.

[15]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[16]  Jitendra Malik,et al.  Context by region ancestry , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[18]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Philip H. S. Torr,et al.  What , Where & How Many ? Combining Object Detectors and CRFs , 2010 .

[22]  Jitendra Malik,et al.  Learning affinity functions for image segmentation: combining patch-based and gradient-based approaches , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[23]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Stella Yu,et al.  Angular Embedding: A Robust Quadratic Criterion , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Stella X. Yu,et al.  Angular embedding: From jarring intensity differences to perceived luminance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yi Yang,et al.  Layered object detection for multi-class segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[29]  Subhransu Maji,et al.  Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[30]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Michael Maire,et al.  Simultaneous Segmentation and Figure/Ground Organization Using Angular Embedding , 2010, ECCV.

[32]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.