From contours to 3D object detection and pose estimation

This paper addresses view-invariant object detection and pose estimation from a single image. While recent work focuses on object-centered representations of point-based object features, we revisit the viewer-centered framework, and use image contours as basic features. Given training examples of arbitrary views of an object, we learn a sparse object model in terms of a few view-dependent shape templates. The shape templates are jointly used for detecting object occurrences and estimating their 3D poses in a new image. Instrumental to this is our new mid-level feature, called bag of boundaries (BOB), aimed at lifting from individual edges toward their more informative summaries for identifying object boundaries amidst the background clutter. In inference, BOBs are placed on deformable grids both in the image and the shape templates, and then matched. This is formulated as a convex optimization problem that accommodates invariance to non-rigid, locally affine shape deformations. Evaluation on benchmark datasets demonstrates our competitive results relative to the state of the art.

[1]  Stella X. Yu,et al.  Linear solution to scale and rotation invariant object matching , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[5]  Commentary / Pylyshyn : Mental imagery : In search of a theory , .

[6]  Cordelia Schmid,et al.  Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Hongsheng Li,et al.  Object matching with a locally affine-invariant constraint , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  M. Tarr,et al.  Do viewpoint-dependent mechanisms generalize across members of a class? , 1998, Cognition.

[11]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Sven J. Dickinson,et al.  Panel report: the potential of geons for generic 3-D object recognition , 1997, Image Vis. Comput..

[14]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Z. Pylyshyn Mental imagery: In search of a theory , 2002, Behavioral and Brain Sciences.

[16]  I. Biederman,et al.  Surface versus edge-based determinants of visual recognition , 1988, Cognitive Psychology.

[17]  Stella X. Yu,et al.  Linear solution to scale and rotation invariant object matching , 2009, CVPR.

[18]  Luc Van Gool,et al.  Backprojection Revisited: Scalable Multi-view Object Detection and Similarity Metrics for Detections , 2010, ECCV.

[19]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[20]  Wenze Hu,et al.  Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Arkadi Nemirovski,et al.  Sums of random symmetric matrices and quadratic optimization under orthogonality constraints , 2007, Math. Program..

[22]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[23]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[24]  Ronen Basri,et al.  Constructing implicit 3D shape models for pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Jianbo Shi,et al.  Contour Context Selection for Object Detection: A Set-to-Set Contour Matching Approach , 2008, ECCV.