论文信息 - RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT [35]. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at https://github.com/princeton-vl/RAFT-Stereo.

[1] Sang Uk Lee,et al. Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Wei Chen,et al. Learning for Disparity Estimation Through Feature Constancy , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Mehrtash Harandi,et al. Hierarchical Neural Architecture Search for Deep Stereo Matching , 2020, NeurIPS.

[4] Yinda Zhang,et al. HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching , 2020, ArXiv.

[5] Ruigang Yang,et al. Domain-invariant Stereo Matching Networks , 2019, ECCV.

[6] Vladimir Kolmogorov,et al. Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Takeshi Naemura,et al. Continuous 3D Label Stereo Matching Using Local Expansion Moves , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Marsha Jo Hannah,et al. Computer matching of areas in stereo images. , 1974 .

[9] Ruigang Yang,et al. Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[12] Xu Zhao,et al. EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection , 2020, International Journal of Computer Vision.

[13] Rui Fan,et al. PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching , 2021, IEEE Robotics and Automation Letters.

[14] Xinguo Liu,et al. Superpixel alpha-expansion and normal adjustment for stereo matching , 2021, J. Vis. Commun. Image Represent..

[15] Ruigang Yang,et al. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Yi Wang,et al. CrossPatch-Based Rolling Label Expansion for Dense Stereo Matching , 2020, IEEE Access.

[18] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Jan Kautz,et al. Bi3D: Stereo Depth Estimation via Binary Classifications , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Torsten Sattler,et al. A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Jun Zhou,et al. Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching , 2020, AAAI.

[22] Gabriel J. Brostow,et al. Learning Stereo from Single Images , 2020, ECCV.

[23] Xiaogang Wang,et al. Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jia Deng,et al. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[25] Stanley T. Birchfield,et al. Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26] Trevor Darrell,et al. Hierarchical Discrete Distribution Decomposition for Match Density Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Nicholay Topin,et al. Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[28] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30] Maxime Moreaud,et al. Sparse Stereo Disparity Map Densification Using Hierarchical Image Segmentation , 2017, ISMM.

[31] Xinge Zhu,et al. AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching , 2020, Computer Vision and Pattern Recognition.

[32] Adam Finkelstein,et al. PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[33] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[34] Qiong Yan,et al. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[35] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36] Ashish Kapoor,et al. TartanAir: A Dataset to Push the Limits of Visual SLAM , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37] James K. Archibald,et al. Improved Census Transforms for Resource-Optimized Stereo Vision , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[38] Ramin Zabih,et al. Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[39] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[40] Christian Heipke,et al. Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[41] Xi Wang,et al. High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[42] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[43] Jungwon Lee,et al. SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44] Jungwon Lee,et al. AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks , 2019, ArXiv.

[45] Michael Happold,et al. Hierarchical Deep Stereo Matching on High-Resolution Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Yann LeCun,et al. Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).