Learning Depth from Single Monocular Images

We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps.

[1]  Rama Chellappa,et al.  New algorithms from reconstruction of a 3-D depth map from one or more images , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Narendra Ahuja,et al.  Performance Analysis of Stereo, Vergence, and Focus as Depth Cues for Active Vision , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Pawan Sinha,et al.  Top-down influences on stereoscopic depth-perception , 1998, Nature Neuroscience.

[4]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[5]  J. Loomis Looking down is looking up , 2001, Nature.

[6]  Masaaki Ikehara,et al.  HMM-based surface reconstruction from single images , 2002, Proceedings. International Conference on Image Processing.

[7]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[8]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[9]  Giuseppina C. Gini,et al.  Indoor Robot Navigation With Single Camera Vision , 2002, PRIS.

[10]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[11]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[12]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[13]  Zijiang J. He,et al.  Perceiving distance accurately by a directional process of integrating ground information , 2004, Nature.

[14]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[15]  Masaaki Ikehara,et al.  HMM-based surface reconstruction from single images , 2002, Proceedings. International Conference on Image Processing.