Building a database of 3D scenes from user annotations

In this paper, we wish to build a high quality database of images depicting scenes, along with their real-world three-dimensional (3D) coordinates. Such a database is useful for a variety of applications, including training systems for object detection and validation of 3D output. We build such a database from images that have been annotated with only the identity of objects and their spatial extent in images. Important for this task is the recovery of geometric information that is implicit in the object labels, such as qualitative relationships between objects (attachment, support, occlusion) and quantitative ones (inferring camera parameters). We describe a model that integrates cues extracted from the object labels to infer the implicit geometric information. We show that we are able to obtain high quality 3D information by evaluating the proposed approach on a database obtained with a laser range scanner. Finally, given the database of 3D scenes, we show how it can find better scene matches for an unlabeled image by expanding the database through viewpoint interpolation to unseen views.

[1]  M. B. Clowes,et al.  On Seeing Things , 1971, Artif. Intell..

[2]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[3]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[4]  Kokichi Sugihara,et al.  An Algebraic Approach to Shape-from-Image Problems , 1984, Artif. Intell..

[5]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[6]  S. Sutherland Seeing things , 1989, Nature.

[7]  Edward Courtney,et al.  2 = 4 M , 1993 .

[8]  Ken-ichi Anjyo,et al.  Tour into the picture: using a spidery mesh interface to make animation from a single image , 1997, SIGGRAPH.

[9]  Li Zhang,et al.  Single view modeling of free-form scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Frédo Durand,et al.  A gentle introduction to bilateral filtering and its applications , 2007, SIGGRAPH Courses.

[11]  Steven M. Seitz,et al.  Single-view modelling of free-form scenes , 2002, Comput. Animat. Virtual Worlds.

[12]  Samy Bengio,et al.  Improving face authentication using virtual samples , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[14]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[15]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[16]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[17]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[19]  Shi,et al.  A Fast Algorithm for Finding Crosswalks using Figure-Ground Segmentation , 2006 .

[20]  Jitendra Malik,et al.  Figure/Ground Assignment in Natural Images , 2006, ECCV.

[21]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[22]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[23]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[24]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ashutosh Saxena,et al.  Learning 3-D Scene Structure from a Single Still Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Arnold W. M. Smeulders,et al.  Depth Information by Stage Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[28]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[29]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[31]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[33]  Luc Van Gool,et al.  Depth-From-Recognition: Inferring Meta-data by Cognitive Feedback , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  A. Torralba,et al.  Creating and exploring a large photorealistic virtual space , 2010, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[35]  Alexei A. Efros,et al.  Can similar scenes help surface layout estimation? , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[37]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[38]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.