RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT [35]. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Stereo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at https://github.com/princeton-vl/RAFT-Stereo.

[1]  Sang Uk Lee,et al.  Robust Stereo Matching Using Adaptive Normalized Cross-Correlation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Wei Chen,et al.  Learning for Disparity Estimation Through Feature Constancy , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Mehrtash Harandi,et al.  Hierarchical Neural Architecture Search for Deep Stereo Matching , 2020, NeurIPS.

[4]  Yinda Zhang,et al.  HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching , 2020, ArXiv.

[5]  Ruigang Yang,et al.  Domain-invariant Stereo Matching Networks , 2019, ECCV.

[6]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Takeshi Naemura,et al.  Continuous 3D Label Stereo Matching Using Local Expansion Moves , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Marsha Jo Hannah,et al.  Computer matching of areas in stereo images. , 1974 .

[9]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[12]  Xu Zhao,et al.  EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection , 2020, International Journal of Computer Vision.

[13]  Rui Fan,et al.  PVStereo: Pyramid Voting Module for End-to-End Self-Supervised Stereo Matching , 2021, IEEE Robotics and Automation Letters.

[14]  Xinguo Liu,et al.  Superpixel alpha-expansion and normal adjustment for stereo matching , 2021, J. Vis. Commun. Image Represent..

[15]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Yi Wang,et al.  CrossPatch-Based Rolling Label Expansion for Dense Stereo Matching , 2020, IEEE Access.

[18]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jan Kautz,et al.  Bi3D: Stereo Depth Estimation via Binary Classifications , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jun Zhou,et al.  Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching , 2020, AAAI.

[22]  Gabriel J. Brostow,et al.  Learning Stereo from Single Images , 2020, ECCV.

[23]  Xiaogang Wang,et al.  Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[25]  Stanley T. Birchfield,et al.  Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Trevor Darrell,et al.  Hierarchical Discrete Distribution Decomposition for Match Density Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[28]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Maxime Moreaud,et al.  Sparse Stereo Disparity Map Densification Using Hierarchical Image Segmentation , 2017, ISMM.

[31]  Xinge Zhu,et al.  AdaStereo: A Simple and Efficient Approach for Adaptive Stereo Matching , 2020, Computer Vision and Pattern Recognition.

[32]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[33]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[34]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[35]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Ashish Kapoor,et al.  TartanAir: A Dataset to Push the Limits of Visual SLAM , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  James K. Archibald,et al.  Improved Census Transforms for Resource-Optimized Stereo Vision , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[39]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[40]  Christian Heipke,et al.  Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[41]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[42]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[43]  Jungwon Lee,et al.  SUW-Learn: Joint Supervised, Unsupervised, Weakly Supervised Deep Learning for Monocular Depth Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Jungwon Lee,et al.  AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks , 2019, ArXiv.

[45]  Michael Happold,et al.  Hierarchical Deep Stereo Matching on High-Resolution Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).