DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo

In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrow-baseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.

[1]  D. N. Yates,et al.  The DAISY Code , 1973 .

[2]  Thomas O. Binford,et al.  Depth from Edge and Intensity Based Stereo , 1981, IJCAI.

[3]  Oscar Firschein,et al.  Readings in computer vision: issues, problems, principles, and paradigms , 1987 .

[4]  L. Quam Hierarchical warp stereo , 1987 .

[5]  Takeo Kanade,et al.  A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiment , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Olivier D. Faugeras,et al.  Computing differential properties of 3-D shapes from stereoscopic images without 3-D models , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Aaron F. Bobick,et al.  Disparity-Space Images and Large Occlusion Stereo , 1994, ECCV.

[8]  Ingemar J. Cox,et al.  A maximum-flow formulation of the N-camera stereo correspondence problem , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  Carlo Tomasi,et al.  A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Olivier D. Faugeras,et al.  Complete Dense Stereovision Using Level Set Methods , 1998, ECCV.

[11]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Rachid Deriche,et al.  Dense Disparity Map Estimation Respecting Image Discontinuities: A PDE and Scale-Space BasedApproach , 2002, MVA.

[13]  G. Medioni,et al.  Tensor Voting : Theory and Applications , 2000 .

[14]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[15]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[17]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[18]  Luc Van Gool,et al.  Dense matching of multiple wide-baseline views , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Darius Burschka,et al.  Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[21]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[22]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Alan L. Yuille,et al.  Occlusions and binocular stereo , 1992, International Journal of Computer Vision.

[25]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[27]  Jian Yao,et al.  3D modeling and rendering from multiple wide-baseline images by match propagation , 2006, Signal Process. Image Commun..

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Luc Van Gool,et al.  Combined Depth and Outlier Estimation in Multi-View Stereo , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Pascal Fua,et al.  On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .