Depth extraction from a single image by sampling based on distance metric learning

In this paper, we propose a new depth sampling method based on a learned Mahalanobis distance (DSMD), instead of traditional Euclidean distance (DSED), for depth extraction from a single image. This metric is learned with the goal that images with similar and dissimilar 3D structures to the query image are separated by a large margin. Thus the learned distance metric can better measure the similarity of 3D structure between images than Euclidean distance. We also propose a simple method based on Gaussian weighting function (D-FGW) for depth fusion of the sampled images. Experiments show that our DSMD method produces more accurate depth estimation of the query image than the DSED method does. Our DFGW method is fast and produces decent results. With the depth fusion method based on energy function minimization [1], our DSMD method achieves state-of-the-art results on Make 3D dataset.

[1]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[2]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[3]  Tsuhan Chen,et al.  Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models , 2010, NIPS.

[4]  Liang-Gee Chen,et al.  A 2D-to-3D conversion system using edge information , 2010, 2010 Digest of Technical Papers International Conference on Consumer Electronics (ICCE).

[5]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[6]  Alexandros Kalousis,et al.  Parametric Local Metric Learning for Nearest Neighbor Classification , 2012, NIPS.

[7]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[10]  Meng Wang,et al.  2D-to-3D image conversion by learning depth from examples , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Tsuhan Chen,et al.  $\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding , 2011, NIPS.

[12]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13]  Liang Zhang,et al.  3D-TV Content Generation: 2D-to-3D Conversion , 2006, 2006 IEEE International Conference on Multimedia and Expo.