Better Stereo Matching From Simple Yet Effective Wrangling of Deep Features

Cost volume plays a pivotal role in stereo matching. Most recent works focused on deep feature extraction and cost refinement for a more accurate cost volume. Unlike them, we probe from a different perspective: feature wrangling. We find that simple wrangling of deep features can effectively improve the construction of cost volume and thus the performance of stereo matching. Specifically, we develop two simple yet effective wrangling techniques of deep features, spatially a differentiable feature transformation and channelwise a memory-economical feature expansion, for better cost construction. Exploiting the local ordering information provided by a differentiable rank transform, we achieve an enhancement of the search for correspondence; with the help of disparity division, our feature expansion allows for more features into the cost volume with no extra memory required. Equipped with these two feature wrangling techniques, our simple network can perform outstandingly on the widely used KITTI and Sceneflow datasets.

[1]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jungwon Lee,et al.  AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks , 2019, ArXiv.

[4]  Xu Zhao,et al.  EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[5]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[7]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Liang Wang,et al.  A Deep Visual Correspondence Embedding Model for Stereo Matching Costs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[11]  Rui Hu,et al.  DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[13]  Lili Ju,et al.  Semantic Stereo Matching With Pyramid Cost Volumes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[17]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[18]  Xiaogang Wang,et al.  Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).