Recovering Surface Layout from an Image

Humans have an amazing ability to instantly grasp the overall 3D structure of a scene—ground orientation, relative positions of major landmarks, etc.—even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this “surface layout” of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis.In this paper, we take the first step towards constructing the surface layout, a labeling of the image intogeometric classes. Our main insight is to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region. Our multiple segmentation framework provides robust spatial support, allowing a wide variety of cues (e.g., color, texture, and perspective) to contribute to the confidence in each geometric label. In experiments on a large set of outdoor images, we evaluate the impact of the individual cues and design choices in our algorithm. We further demonstrate the applicability of our method to indoor images, describe potential applications, and discuss extensions to a more complete notion of surface layout.

[1]  J. Gibson The perception of visual surfaces. , 1950, The American journal of psychology.

[2]  R. Hetherington The Perception of the Visual World , 1952 .

[3]  Adolfo Guzmán-Arenas,et al.  COMPUTER RECOGNITION OF THREE-DIMENSIONAL OBJECTS IN A VISUAL SCENE , 1968 .

[4]  R. M. Warren,et al.  HELMHOLTZ ON PERCEPTION: ITS PHYSIOLOGY AND DEVELOPMENT. , 1970 .

[5]  Jerome A. Feldman,et al.  A Semantics-Based Decision Theory Region Analyser , 1973, IJCAI.

[6]  Harry G. Barrow,et al.  Experiments in Interpretation-Guided Segmentation , 1977, Artificial Intelligence.

[7]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[8]  Rodney A. Brooks,et al.  Model-Based Three-Dimensional Interpretations of Two-Dimensional Images , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  大田 友一,et al.  Knowledge-based interpretation of outdoor natural color scenes , 1985 .

[10]  R. Brooks Model-Based Three-Dimensional Interpretations of Two-Dimensional Images , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Narendra Ahuja,et al.  A Transform for Multiscale Image Segmentation by Integrated Edge and Region Detection , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  J. Koenderink,et al.  Pictorial surface attitude and local depth comparisons , 1996, Perception & psychophysics.

[13]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Jan J. Koenderink,et al.  Pictorial relief , 2019, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[16]  Barry T. Thomas,et al.  Head-Mounted Mobility Aid for Low Vision Using Scene Classification Techniques , 1998, Int. J. Virtual Real..

[17]  Reinhard Koch,et al.  Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[18]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Antonio Criminisi,et al.  Creating Architectural Models from Images , 1999, Comput. Graph. Forum.

[20]  Ronen Basri,et al.  Fast multiscale image segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[21]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[22]  Alan L. Yuille,et al.  Statistical cues for domain specific image segmentation with performance analysis , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  A Global Matching Framework for Stereo Computation , 2001, ICCV.

[24]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[25]  Refractor Vision , 2000, The Lancet.

[26]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[27]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[29]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[30]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[31]  Song-Chun Zhu,et al.  Towards a mathematical theory of primal sketch and sketchability , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  Feng Han,et al.  Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[33]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Antonio Torralba,et al.  Graphical Model For Recognizing Scenes and Objects. , 2003, NIPS 2003.

[35]  Harry Shum,et al.  Lazy snapping , 2004, ACM Trans. Graph..

[36]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[37]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[38]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[39]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[40]  Feng Han,et al.  Bottom-up/top-down image parsing by attribute graph grammar , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[41]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[42]  Adrian Barbu,et al.  Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[44]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[45]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[46]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[47]  Alexei A. Efros,et al.  Opportunistic Use of Vision to Push Back the Path-Planning Horizon , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[48]  Joachim M. Buhmann,et al.  Model Order Selection and Cue Combination for Image Segmentation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Honglak Lee,et al.  A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Antonio Torralba,et al.  Depth from Familiar Objects: A Hierarchical Model for 3D Scenes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[51]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[52]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[53]  Pablo Andrés Arbeláez,et al.  Boundary Extraction in Natural Images Using Ultrametric Contour Maps , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).