论文信息 - Multi-Level Context Ultra-Aggregation for Stereo Matching

Multi-Level Context Ultra-Aggregation for Stereo Matching

Exploiting multi-level context information to cost volume can improve the performance of learning-based stereo matching methods. In recent years, 3-D Convolution Neural Networks (3-D CNNs) show the advantages in regularizing cost volume but are limited by unary features learning in matching cost computation. However, existing methods only use features from plain convolution layers or a simple aggregation of multi-level features to calculate cost volume, which is insufficient because stereo matching requires discriminative features to identify corresponding pixels in rectified stereo image pairs. In this paper, we propose a unary features descriptor using multi-level context ultra-aggregation (MCUA), which encapsulates all convolutional features into a more discriminative representation by intra- and inter-level features combination. Specifically, a child module that takes low-resolution images as input captures larger context information; the larger context information from each layer is densely connected to the main branch of the network. MCUA makes good usage of multi-level features with richer context and performs the image-to-image prediction holistically. We introduce our MCUA scheme for cost volume calculation and test it on PSM-Net. We also evaluate our method on Scene Flow and KITTI 2012/2015 stereo datasets. Experimental results show that our method outperforms state-of-the-art methods by a notable margin and effectively improves the accuracy of stereo matching.

[1] Kyoung Mu Lee,et al. Look Wider to Match Image Patches With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[2] Wei Chen,et al. Learning for Disparity Estimation Through Feature Constancy , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Nikos Komodakis,et al. Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Xu Zhao,et al. EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[5] Raquel Urtasun,et al. Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Lior Wolf,et al. Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Qiong Yan,et al. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[10] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Tomaso A. Poggio,et al. Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[12] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Xaq Pitkow,et al. Skip Connections Eliminate Singularities , 2017, ICLR.

[14] Shahram Izadi,et al. StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[15] Zhidong Deng,et al. SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[16] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[17] Martin A. Fischler,et al. Computational Stereo , 1982, CSUR.

[18] Yann LeCun,et al. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[19] Trevor Darrell,et al. Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Jun Fu,et al. Stacked Deconvolutional Network for Semantic Segmentation , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[22] Hui Jiang,et al. Higher Order Recurrent Neural Networks , 2016, ArXiv.

[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Yann LeCun,et al. Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Shuicheng Yan,et al. Dual Path Networks , 2017, NIPS.

[26] Yun Fu,et al. Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Qiang Chen,et al. Network In Network , 2013, ICLR.

[28] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).