Depth Estimation from Image Structure

In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges, and junctions may provide a 3D model of the scene but it will not provide information about the actual "scale" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, object recognition, under unconstrained conditions, remains difficult and unreliable for current computational approaches. We propose a source of information for absolute depth estimation based on the whole scene structure that does not rely on specific objects. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene and, therefore, its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.

[1]  Harry G. Barrow,et al.  Interpreting Line Drawings as Three-Dimensional Surfaces , 1980, Artif. Intell..

[2]  Alex Pentland,et al.  Fractal-Based Description of Natural Scenes , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[4]  James M. Keller,et al.  Characteristics of Natural Scenes Related to the Fractal Dimension , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[6]  J. Bergen,et al.  Computational Modeling of Visual Texture Segregation , 1991 .

[7]  Leslie S. Smith,et al.  The principal components of natural images , 1992 .

[8]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[9]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[10]  Rosalind W. Picard,et al.  Texture orientation for sorting photos "at a glance" , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[11]  James R. Bergen,et al.  Pyramid-based texture analysis/synthesis , 1995, Proceedings., International Conference on Image Processing.

[12]  J. Bergen,et al.  Pyramid-based texture analysis/synthesis , 1995, Proceedings., International Conference on Image Processing.

[13]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[14]  Alan C. Bovik,et al.  Shape from Texture Using Local Spectral Moments , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  J. V. van Hateren,et al.  Modelling the power spectra of natural images: statistics and information. , 1996, Vision research.

[16]  W. Richards,et al.  Model structure and reliable inference , 1996 .

[17]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[18]  Fang Liu,et al.  Periodicity, Directionality, and Randomness: Wold Features for Image Modeling and Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  J. H. van Hateren,et al.  Modelling the Power Spectra of Natural Images: Statistics and Information , 1996, Vision Research.

[20]  Paul A. Viola,et al.  Structure Driven Image Database Retrieval , 1997, NIPS.

[21]  Roland Baddeley,et al.  The Correlational Structure of Natural Images and the Calibration of Spatial Representations , 1997, Cogn. Sci..

[22]  Daniel L. Ruderman,et al.  Origins of scaling in natural images , 1996, Vision Research.

[23]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[24]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[25]  Neil Gershenfeld,et al.  The nature of mathematical modeling , 1998 .

[26]  Charles A. Bouman,et al.  Perceptual image similarity experiments , 1998, Electronic Imaging.

[27]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[28]  Aude Oliva,et al.  Global semantic classification of scenes using power spectrum templates , 1999 .

[29]  Antonio Torralba,et al.  Semantic organization of scenes using discriminant structural templates , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[31]  Tony Lindeberg,et al.  Principles for Automatic Scale Selection , 1999 .

[32]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[33]  Antonio Torralba,et al.  Statistical context priming for object detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[34]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[35]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[36]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[37]  David Mumford,et al.  Occlusion Models for Natural Images: A Statistical Study of a Scale-Invariant Dead Leaves Model , 2004, International Journal of Computer Vision.

[38]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[39]  Song-Chun Zhu,et al.  Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling , 1998, International Journal of Computer Vision.

[40]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[41]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[42]  Michael Lindenbaum,et al.  Shape Reconstruction of 3D Bilaterally Symmetric Surfaces , 2000, International Journal of Computer Vision.