Building a database of 3D scenes from user annotations

In this paper, we wish to build a high quality database of images depicting scenes, along with their real-world three-dimensional (3D) coordinates. Such a database is useful for a variety of applications, including training systems for object detection and validation of 3D output. We build such a database from images that have been annotated with only the identity of objects and their spatial extent in images. Important for this task is the recovery of geometric information that is implicit in the object labels, such as qualitative relationships between objects (attachment, support, occlusion) and quantitative ones (inferring camera parameters). We describe a model that integrates cues extracted from the object labels to infer the implicit geometric information. We show that we are able to obtain high quality 3D information by evaluating the proposed approach on a database obtained with a laser range scanner. Finally, given the database of 3D scenes, we show how it can find better scene matches for an unlabeled image by expanding the database through viewpoint interpolation to unseen views.

[1]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Li Zhang,et al.  Single view modeling of free-form scenes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[4]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[5]  Luc Van Gool,et al.  Depth-From-Recognition: Inferring Meta-data by Cognitive Feedback , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Jitendra Malik,et al.  Figure/Ground Assignment in Natural Images , 2006, ECCV.

[7]  Alexei A. Efros,et al.  Can similar scenes help surface layout estimation? , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[9]  Ken-ichi Anjyo,et al.  Tour into the picture: using a spidery mesh interface to make animation from a single image , 1997, SIGGRAPH.

[10]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[11]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  S. Sutherland Seeing things , 1989, Nature.

[13]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[14]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[15]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[17]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[18]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[19]  Arnold W. M. Smeulders,et al.  Depth Information by Stage Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[22]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Fei-Fei Li,et al.  OPTIMOL: Automatic Online Picture Collection via Incremental Model Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[27]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[28]  Frédo Durand,et al.  A gentle introduction to bilateral filtering and its applications , 2007, SIGGRAPH Courses.

[29]  Samy Bengio,et al.  Improving face authentication using virtual samples , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[30]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[31]  Shi,et al.  A Fast Algorithm for Finding Crosswalks using Figure-Ground Segmentation , 2006 .

[32]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[33]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..