Efficient Hybrid Tree-Based Stereo Matching With Applications to Postcapture Image Refocusing

Estimating dense correspondence or depth information from a pair of stereoscopic images is a fundamental problem in computer vision, which finds a range of important applications. Despite intensive past research efforts in this topic, it still remains challenging to recover the depth information both reliably and efficiently, especially when the input images contain weakly textured regions or are captured under uncontrolled, real-life conditions. Striking a desired balance between computational efficiency and estimation quality, a hybrid minimum spanning tree-based stereo matching method is proposed in this paper. Our method performs efficient nonlocal cost aggregation at pixel-level and region-level, and then adaptively fuses the resulting costs together to leverage their respective strength in handling large textureless regions and fine depth discontinuities. Experiments on the standard Middlebury stereo benchmark show that the proposed stereo method outperforms all prior local and nonlocal aggregation-based methods, achieving particularly noticeable improvements for low texture regions. To further demonstrate the effectiveness of the proposed stereo method, also motivated by the increasing desire to generate expressive depth-induced photo effects, this paper is tasked next to address the emerging application of interactive depth-of-field rendering given a real-world stereo image pair. To this end, we propose an accurate thin-lens model for synthetic depth-of-field rendering, which considers the user-stroke placement and camera-specific parameters and performs the pixel-adapted Gaussian blurring in a principled way. Taking ~1.5 s to process a pair of 640×360 images in the off-line step, our system named Scribble2focus allows users to interactively select in-focus regions by simple strokes using the touch screen and returns the synthetically refocused images instantly to the user.

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Frederik Zilly,et al.  Adaptive cross-trilateral depth map filtering , 2010, 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[4]  Jian Sun,et al.  Symmetric stereo matching for occlusion handling , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Kari Pulli,et al.  Robust stereo with flash and no-flash image pairs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[7]  Olga Veksler,et al.  Fast variable window for stereo correspondence using integral images , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Stefano Mattoccia,et al.  Accurate and Efficient Cost Aggregation Strategy for Stereo Correspondence Based on Approximated Joint Bilateral Filtering , 2009, ACCV.

[9]  Li Hong,et al.  Segment-based stereo matching using graph cuts , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, ACM Trans. Graph..

[11]  In-So Kweon,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Shree K. Nayar,et al.  PiCam , 2013, ACM Trans. Graph..

[13]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[14]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[15]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[16]  Olga Veksler,et al.  Stereo correspondence by dynamic programming on a tree , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Minh N. Do,et al.  Joint Histogram-Based Cost Aggregation for Stereo Matching , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Aaron F. Bobick,et al.  Large Occlusion Stereo , 1999, International Journal of Computer Vision.

[19]  Qingxiong Yang,et al.  A non-local cost aggregation method for stereo matching , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Frédo Durand,et al.  Multi-aperture photography , 2007, ACM Trans. Graph..

[21]  Cheng Lei,et al.  Region-Tree Based Stereo Using Dynamic Programming Optimization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[23]  Jian Sun,et al.  Guided Image Filtering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yi Deng,et al.  A Fast Line Segment Based Dense Stereo Algorithm Using Tree Dynamic Programming , 2006, ECCV.

[25]  Long Quan,et al.  Region-based progressive stereo matching , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Tomoyuki Nishita,et al.  Extracting depth and matte using a color-filtered aperture , 2008, SIGGRAPH Asia '08.

[27]  Sing Bing Kang,et al.  Stereo for Image-Based Rendering using Image Over-Segmentation , 2007, International Journal of Computer Vision.

[28]  Minh N. Do,et al.  Cross-based local multipoint filtering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[30]  Frédo Durand,et al.  Image and depth from a conventional camera with a coded aperture , 2007, ACM Trans. Graph..

[31]  Gauthier Lafruit,et al.  Cross-Based Local Stereo Matching Using Orthogonal Integral Images , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Liming Chen,et al.  An improved Non-Local Cost Aggregation method for stereo matching based on color and boundary cue , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[33]  Yi Deng,et al.  A symmetric patch-based correspondence model for occlusion handling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[34]  Ruigang Yang,et al.  Global stereo matching leveraged by sparse ground control points , 2011, CVPR 2011.

[35]  Minh N. Do,et al.  Patch Match Filter: Efficient Edge-Aware Filtering Meets Randomized Search for Fast Correspondence Field Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[37]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..