3DNN: 3D Nearest Neighbor

In this paper, we describe a data-driven approach to leverage repositories of 3D models for scene understanding. Our ability to relate what we see in an image to a large collection of 3D models allows us to transfer information from these models, creating a rich understanding of the scene. We develop a framework for auto-calibrating a camera, rendering 3D models from the viewpoint an image was taken, and computing a similarity measure between each 3D model and an input image. We demonstrate this data-driven approach in the context of geometry estimation and show the ability to find the identities, poses and styles of objects in a scene. The true benefit of 3DNN compared to a traditional 2D nearest-neighbor approach is that by generalizing across viewpoints, we free ourselves from the need to have training examples captured from all possible viewpoints. Thus, we are able to achieve comparable results using orders of magnitude less data, and recognize objects from never-before-seen viewpoints. In this work, we describe the 3DNN algorithm and rigorously evaluate its performance for the tasks of geometry estimation and object detection/segmentation, as well as two novel applications: affordance estimation and photorealistic object insertion.

[1]  Paul E. Debevec,et al.  Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography , 1998, SIGGRAPH '08.

[2]  Jana Kosecka,et al.  Nonparametric Scene Parsing with Adaptive Feature Relevance and Semantic Context , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.

[5]  Scott Satkin Data-Driven Geometric Scene Understanding , 2013 .

[6]  Dima Damen,et al.  Egocentric Real-time Workspace Monitoring using an RGB-D camera , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[8]  Raquel Urtasun,et al.  Efficient Exact Inference for 3D Indoor Scene Understanding , 2012, ECCV.

[9]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[10]  周志国 Google SketchUp 软件在工程投标中的应用 , 2014 .

[11]  Antonio Torralba,et al.  LabelMe: Online Image Annotation and Applications , 2010, Proceedings of the IEEE.

[12]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[14]  Pat Hanrahan,et al.  Characterizing structural relationships in scenes using graph kernels , 2011, ACM Trans. Graph..

[15]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[16]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[18]  Stephen Gould,et al.  Discriminative learning with latent variables for cluttered indoor scene understanding , 2010, CACM.

[19]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[20]  David A. Forsyth,et al.  Recovering free space of indoor scenes from a single image , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Martial Hebert,et al.  3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Martial Hebert,et al.  Data-Driven 3D Primitives for Single Image Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Jitendra Malik,et al.  Inferring spatial layout from a single image via depth-ordered grouping , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[25]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[28]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Martial Hebert,et al.  Data-Driven Scene Understanding from 3D Models , 2012, BMVC.

[30]  Silvio Savarese,et al.  Toward coherent object detection and scene layout understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[32]  Rama Chellappa,et al.  Fast directional chamfer matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  H. C. Longuet-Higgins,et al.  A computer algorithm for reconstructing a scene from two projections , 1981, Nature.

[34]  Pat Hanrahan,et al.  Context-based search for 3D models , 2010, ACM Trans. Graph..

[35]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[36]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[37]  Derek Hoiem,et al.  Single-image shadow detection and removal using paired regions , 2011, CVPR 2011.

[38]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[39]  Luc Van Gool,et al.  What makes a chair a chair? , 2011, CVPR 2011.

[40]  Michael F. Cohen,et al.  Image-Based Remodeling , 2013, IEEE Transactions on Visualization and Computer Graphics.

[41]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[42]  Neil A. Dodgson,et al.  Proceedings Ninth IEEE International Conference on Computer Vision , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[43]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[44]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[45]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  Rodney A. Brooks,et al.  Symbolic Reasoning Among 3-D Models and 2-D Images , 1981, Artif. Intell..

[47]  Kalyan Sunkavalli,et al.  Automatic Scene Inference for 3D Object Compositing , 2014, ACM Trans. Graph..

[48]  Antonio Torralba,et al.  Inverting and Visualizing Features for Object Detection , 2012, ArXiv.

[49]  Dani Lischinski,et al.  Spectral Matting , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Silvio Savarese,et al.  Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Taku Komura,et al.  Topology matching for fully automatic similarity estimation of 3D shapes , 2001, SIGGRAPH.

[53]  Dieter Fox,et al.  3D laser scan classification using web data and domain adaptation , 2009, Robotics: Science and Systems.

[54]  Hans-Peter Seidel,et al.  Automatic photo-to-terrain alignment for the annotation of mountain pictures , 2011, CVPR 2011.

[55]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[57]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[59]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[60]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[61]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[62]  Song-Chun Zhu,et al.  Scene Parsing by Integrating Function, Geometry and Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Matthew Brand,et al.  SKYLINE2GPS: Localization in urban canyons using omni-skylines , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  Joseph Schlecht,et al.  Sampling bedrooms , 2011, CVPR 2011.

[65]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[66]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[67]  Kobus Barnard,et al.  Understanding Bayesian Rooms Using Composite 3D Object Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).