Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models

This paper poses object category detection in images as a type of 2D-to-3D alignment problem, utilizing the large quantities of 3D CAD models that have been made publicly available online. Using the "chair" class as a running example, we propose an exemplar-based 3D category representation, which can explicitly model chairs of different styles as well as the large variation in viewpoint. We develop an approach to establish part-based correspondences between 3D CAD models and real photographs. This is achieved by (i) representing each 3D model using a set of view-dependent mid-level visual elements learned from synthesized views in a discriminative fashion, (ii) carefully calibrating the individual element detectors on a common dataset of negative images, and (iii) matching visual elements to the test image allowing for small mutual deformations but preserving the viewpoint and style constraints. We demonstrate the ability of our system to align 3D models with 2D objects in the challenging PASCAL VOC images, which depict a wide variety of chairs in complex scenes.

[1]  Edward Courtney,et al.  2 = 4 M , 1993 .

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  David G. Lowe,et al.  The viewpoint consistency constraint , 2015, International Journal of Computer Vision.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Joseph L. Mundy,et al.  Object Recognition in the Geometric Era: A Retrospective , 2006, Toward Category-Level Object Recognition.

[8]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[14]  Andrew Zisserman,et al.  Smooth object retrieval using a bag of boundaries , 2011, 2011 International Conference on Computer Vision.

[15]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[16]  Jay Yagnik,et al.  The power of comparative reasoning , 2011, 2011 International Conference on Computer Vision.

[17]  Ronen Basri,et al.  Viewpoint-aware object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[18]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[19]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[20]  Martial Hebert,et al.  Data-Driven Scene Understanding from 3D Models , 2012, BMVC.

[21]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[23]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[24]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[25]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[27]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[28]  Tomasz Malisiewicz,et al.  A Gaussian Approximation of Feature Space for Fast Image Similarity , 2012 .

[29]  Kobus Barnard,et al.  Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[35]  Silvio Savarese,et al.  Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Larry S. Davis,et al.  Representing Videos Using Mid-level Discriminative Patches , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[38]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.