Efficient Object Localization and Pose Estimation with 3D Wireframe Models

We propose a new and efficient method for 3D object localization and fine-grained 3D pose estimation from a single 2D image. Our approach follows the classical paradigm of matching a 3D model to the 2D observations. Our first contribution is a 3D object model composed of a set of 3D edge primitives learned from 2D object blueprints, which can be viewed as a 3D generalization of HOG features. This model is used to define a matching cost obtained by applying a rigid-body transformation to the 3D object model, projecting it onto the image plane, and matching the projected model to HOG features extracted from the input image. Our second contribution is a very efficient branch-and-bound algorithm for finding the 3D pose that maximizes the matching score. For this, 3D integral images of quantized HOGs are employed to evaluate in constant time the maximum attainable matching scores of individual model primitives. We applied our method to three different datasets of cars and achieved promising results with testing times as low as less than half a second.

[1]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[2]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Iasonas Kokkinos,et al.  Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound , 2011, NIPS.

[5]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Wenze Hu,et al.  Learning 3D object templates by hierarchical quantization of geometry and appearance spaces , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ronen Basri,et al.  Constructing implicit 3D shape models for pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ronen Basri,et al.  Viewpoint-aware object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[11]  Henry Schneiderman,et al.  Feature-centric evaluation for efficient cascaded object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Trevor Darrell,et al.  Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[13]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[16]  Wenze Hu,et al.  Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Thomas O. Binford,et al.  Computer Description of Curved Objects , 1973, IEEE Transactions on Computers.

[18]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Cordelia Schmid,et al.  Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[23]  Sinisa Todorovic,et al.  From contours to 3D object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[24]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.