Data-driven single image depth estimation using weighted median statistics

In this paper, a data-driven approach is proposed for automatically estimating a plausible depth map from a single monocular image based on the weighted median statistics (WMS). Instead of using complicated parametric models for learning frameworks that are typically employed in existing methods, we cast the estimation as a simple yet effective statistical approach. It assigns perceptually proper depth values to an input image in accordance with a data-driven depth prior. Based on the assumption that similar scenes are likely to have similar depth structure, the depth prior is computed from the WMS of k-nearest neighbor 3D pairs in a large 3D image repository. We show that the WMS captures the underlying depth structure of the input image very well, even though the visual appearance of nearest neighbor images are not tightly aligned. The depth map is then inferred according to the depth prior by making use of the edge-aware image filtering technique, resulting in a discontinuity-preserving smooth depth map. Experimental results demonstrate that our method outperforms state-of-the-art methods in terms of both accuracy and efficiency.

[1]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[2]  Meng Wang,et al.  Automatic 2D-to-3D image conversion using 3D examples from the internet , 2012, Electronic Imaging.

[3]  Paul A. Griffin,et al.  Statistical Approach to Shape from Shading: Reconstruction of Three-Dimensional Face Surfaces from Single Two-Dimensional Images , 1996, Neural Computation.

[4]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[5]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Enhua Wu,et al.  Constant Time Weighted Median Filtering for Stereo Matching and Beyond , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[8]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Minh N. Do,et al.  Depth Video Enhancement Based on Weighted Mode Filtering , 2012, IEEE Transactions on Image Processing.

[10]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Moncef Gabbouj,et al.  Weighted median filters: a tutorial , 1996 .

[13]  Markus H. Gross,et al.  StereoBrush: interactive 2D to 3D conversion using discontinuous warps , 2011, SBIM '11.

[14]  Meng Wang,et al.  2D-to-3D image conversion by learning depth from examples , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Gang Wang,et al.  Fusion of Median and Bilateral Filtering for Range Image Upsampling , 2013, IEEE Transactions on Image Processing.

[16]  Kwanghoon Sohn,et al.  A Stereoscopic Video Generation Method Using Stereoscopic Display Characterization and Motion Analysis , 2008, IEEE Transactions on Broadcasting.

[17]  Meng Wang,et al.  Learning-Based, Automatic 2D-to-3D Image and Video Conversion , 2013, IEEE Transactions on Image Processing.