A learning-based framework for depth ordering

Depth ordering is instrumental for understanding the 3D geometry of an image. Humans are surprisingly good at depth ordering even with abstract 2D line drawings. In this paper we propose a learning-based framework for depth ordering inference. Boundary and junction characteristics are important clues for this task, and we have developed new features based on these attributes. Although each feature individually can produce reasonable depth ordering results, each still has limitations, and we can achieve better performance by combining them. In practice, local depth ordering inferences can be contradictory. Therefore, we propose a Markov Random Field model with terms that are more global than previous work, and use graph optimization to encourage a globally consistent ordering. In addition, to produce better object segmentation for the task of depth ordering, we propose to explicitly enforce closed loops and long edges for the occlusion boundary detection. We collect a new depth-order dataset for this problem, including more than a thousand human-labeled images with various daily objects and configurations. The proposed algorithm shows promising performance over conventional methods on both synthetic and real scenes.

[1]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Harry G. Barrow,et al.  Interpreting Line Drawings as Three-Dimensional Surfaces , 1980, Artif. Intell..

[4]  Jitendra Malik,et al.  Occlusion boundary detection and figure/ground assignment from optical flow , 2011, CVPR 2011.

[5]  David L. Waltz,et al.  Generating Semantic Descriptions From Drawings of Scenes With Shadows , 1972 .

[6]  Philippe Salembier,et al.  Occlusion-based depth ordering on monocular images with Binary Partition Tree , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[8]  Mariella Dimiccoli,et al.  Exploiting T-junctions for depth segregation in single images , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from an Image , 2011, International Journal of Computer Vision.

[11]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jitendra Malik,et al.  Figure/Ground Assignment in Natural Images , 2006, ECCV.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  Stephen Gould,et al.  A Unified Contour-Pixel Model for Figure-Ground Segmentation , 2010, ECCV.

[16]  Adam Finkelstein,et al.  How well do line drawings depict shape? , 2009, SIGGRAPH '09.

[17]  Harry G. Barrow,et al.  Retrospective on "Interpreting Line Drawings as Three-Dimensional Surfaces" , 1993, Artif. Intell..

[18]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.