论文信息 - Hierarchical Deep Stereo Matching on High-Resolution Images

Hierarchical Deep Stereo Matching on High-Resolution Images

We explore the problem of real-time stereo matching on high-res imagery. Many state-of-the-art (SOTA) methods struggle to process high-res imagery because of memory constraints or speed limitations. To address this issue, we propose an end-to-end framework that searches for correspondences incrementally over a coarse-to-fine hierarchy. Because high-res stereo datasets are relatively rare, we introduce a dataset with high-res stereo pairs for both training and evaluation. Our approach achieved SOTA performance on Middlebury-v3 and KITTI-15 while running significantly faster than its competitors. The hierarchical design also naturally allows for anytime on-demand reports of disparity by capping intermediate coarse results, allowing us to accurately predict disparity for near-range structures with low latency (30ms). We demonstrate that the performance-vs-speed tradeoff afforded by on-demand hierarchies may address sensing needs for time-critical applications such as autonomous driving.

[1] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2] Takeo Kanade,et al. Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Vladimir Kolmogorov,et al. Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4] D. Scharstein,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[5] P. Anandan,et al. A computational framework and an algorithm for the measurement of visual motion , 1987, International Journal of Computer Vision.

[6] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[7] Miao Liao,et al. Real-time Global Stereo Matching Using Hierarchical Belief Propagation , 2006, BMVC.

[8] H. Hirschmüller. Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Vincent Lepetit,et al. A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Pascal Fua,et al. On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11] H. Hirschmuller,et al. Stereo matching in the presence of sub-pixel calibration errors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Andreas Geiger,et al. Efficient Large-Scale Stereo Matching , 2010, ACCV.

[13] Vincent Lepetit,et al. DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Carsten Rother,et al. Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[15] Jitendra Malik,et al. Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[17] Eric Psota,et al. Real-Time Stereo Matching on CUDA Using an Iterative Refinement Method for Adaptive Support-Weight Correspondences , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[18] Xi Wang,et al. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[19] Rudolf Mester,et al. Know Your Limits: Accuracy of Long Range Stereoscopic Object Measurements in Practice , 2014, ECCV.

[20] Richard Szeliski,et al. Efficient High-Resolution Stereo Matching Using Local Plane Sweeps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[22] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Yann LeCun,et al. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[25] Raquel Urtasun,et al. Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Germán Ros,et al. CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[29] Torsten Sattler,et al. A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Qiong Yan,et al. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[32] Philippos Mordohai,et al. High-Resolution Stereo Matching based on Sampled Photoconsistency Computation , 2017, BMVC.

[33] Yong Jae Lee,et al. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Shahram Izadi,et al. StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[35] Philippos Mordohai,et al. CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Wei Chen,et al. Learning for Disparity Estimation Through Feature Constancy , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Xu Zhao,et al. EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[39] Zhidong Deng,et al. SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[40] Takeshi Naemura,et al. Continuous 3D Label Stereo Matching Using Local Expansion Moves , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] Thomas Brox,et al. Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation , 2018, ECCV.

[42] François Fleuret,et al. Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching , 2018, NeurIPS.

[43] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.