Toward Perceptually-Consistent Stereo: A Scanline Study

Two types of information exist in a stereo pair: correlation (matching) and decorrelation (half-occlusion). Vision science has shown that both types of information are used in the visual cortex, and that people can perceive depth even when correlation cues are absent or very weak, a capability that remains absent from most computational stereo systems. As a step toward stereo algorithms that are more consistent with these perceptual phenomena, we re-examine the topic of scanline stereo as energy minimization. We represent a disparity profile as a piecewise smooth function with explicit breakpoints between its smooth pieces, and we show this allows correlation and decorrelation to be integrated into an objective that requires only two types of local information: the correlation and its spatial gradient. Experimentally, we show the global optimum of this objective matches human perception on a broad collection of wellknown perceptual stimuli, and that it also provides reasonable piecewise-smooth interpretations of depth in natural images, even without exploiting monocular boundary cues.

[1]  K. Prazdny,et al.  Detection of binocular disparities , 2004, Biological Cybernetics.

[2]  Martin Humenberger,et al.  A census-based stereo vision algorithm using modified Semi-Global Matching and plane fitting to improve matching quality , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[3]  K. Nakayama,et al.  Toward a general theory of stereopsis: binocular matching, occluding contours, and fusion. , 1994, Psychological review.

[4]  Takeo Kanade,et al.  A Cooperative Algorithm for Stereo Matching and Occlusion Detection , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Inna Tsirlin,et al.  A computational theory of da Vinci stereopsis. , 2014, Journal of vision.

[6]  David Mumford,et al.  A Bayesian treatment of the stereo correspondence problem using half-occluded regions , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Geoffrey Egnal,et al.  Detecting Binocular Half-Occlusions: Empirical Comparisons of Five Approaches , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Alan L. Yuille,et al.  Occlusions and binocular stereo , 1992, International Journal of Computer Vision.

[10]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ning Qian,et al.  Solving da Vinci stereopsis with depth-edge-selective V2 cells , 2007, Vision Research.

[12]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[13]  Daphna Weinshall,et al.  Perception of multiple transparent planes in stereo vision , 1989, Nature.

[14]  Thomas O. Binford,et al.  Depth from Edge and Intensity Based Stereo , 1981, IJCAI.

[15]  Pascal Fua,et al.  A parallel stereo algorithm that produces dense depth maps and preserves image features , 1993, Machine Vision and Applications.

[16]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Todd Zickler,et al.  A dynamic programming algorithm for perceptually consistent stereo , 2017 .

[18]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[19]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Laurie M. Wilcox,et al.  The role of monocularly visible regions in depth and surface perception , 2009, Vision Research.

[21]  Bevil R. Conway,et al.  Receptive Fields of Disparity-Tuned Simple Cells in Macaque V1 , 2003, Neuron.

[22]  Hong Zhou,et al.  Representation of stereoscopic edges in monkey visual cortex , 2000, Vision Research.

[23]  Hongyang Chao,et al.  MeshStereo: A Global Stereo Model with Mesh Alignment Regularization for View Interpolation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Barton L. Anderson,et al.  The role of partial occlusion in stereopsis , 1994, Nature.

[25]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[26]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[27]  Aaron F. Bobick,et al.  Large Occlusion Stereo , 1999, International Journal of Computer Vision.

[28]  Inna Tsirlin,et al.  Monocular occlusions determine the perceived shape and depth of occluding surfaces. , 2010, Journal of vision.

[29]  Rahul Nair,et al.  Ensemble Learning for Confidence Measures in Stereo Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  B Gillam,et al.  The Role of Monocular Regions in Stereoscopic Displays , 1988, Perception.

[32]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yiannis Aloimonos,et al.  Shape and the Stereo Correspondence Problem , 2005, International Journal of Computer Vision.

[34]  Peter N. Belhumeur,et al.  A Bayesian approach to binocular steropsis , 1996, International Journal of Computer Vision.

[35]  B JULESZ,et al.  Binocular Depth Perception without Familiarity Cues , 1964, Science.

[36]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[37]  Ying Xiong,et al.  Low-level vision by consensus in a spatial hierarchy of regions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[40]  Shinsuke Shimojo,et al.  Da vinci stereopsis: Depth and subjective occluding contours from unpaired image points , 1990, Vision Research.

[41]  Narendra Ahuja,et al.  Two-view Matching , 1988, [1988 Proceedings] Second International Conference on Computer Vision.