论文信息 - Fast Deep Stereo with 2D Convolutional Processing of Cost Signatures

Fast Deep Stereo with 2D Convolutional Processing of Cost Signatures

Modern neural network-based algorithms are able to produce highly accurate depth estimates from stereo image pairs, nearly matching the reliability of measurements from more expensive depth sensors. However, this accuracy comes with a higher computational cost since these methods use network architectures designed to compute and process matching scores across all candidate matches at all locations, with floating point computations repeated across a match volume with dimensions corresponding to both space and disparity. This leads to longer running times to process each image pair, making them impractical for real-time use in robots and autonomous vehicles. We propose a new stereo algorithm that employs a significantly more efficient network architecture. Our method builds an initial match cost volume using traditional matching costs that are fast to compute, and trains a network to estimate disparity from this volume. Crucially, our network only employs per-pixel and two-dimensional convolution operations: to summarize the local match information at each location as a lowdimensional feature vector, and to spatially process these "cost-signature" features to produce a dense disparity map. Experimental results on KITTI show that our method delivers competitive accuracy at significantly higher speeds— running at 48 frames per second on a modern GPU.

Ayan Chakrabarti | Kyle Yee | Ayan Chakrabarti | K. Yee

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Victor S. Lempitsky,et al. End-to-End learning of cost-volume aggregation for real-time dense stereo , 2016, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[3] Raquel Urtasun,et al. Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[4] Yann LeCun,et al. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[5] Ramin Zabih,et al. Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[6] Xu Zhao,et al. EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[7] Ying Xiong,et al. Low-level vision by consensus in a spatial hierarchy of regions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10] Ruigang Yang,et al. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yann LeCun,et al. Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Wei Chen,et al. Learning for Disparity Estimation Through Feature Constancy , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Raquel Urtasun,et al. Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Rui Hu,et al. DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Luigi di Stefano,et al. Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Zhidong Deng,et al. SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[19] Heiko Hirschmüller,et al. Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[20] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Kihong Park,et al. High-Precision Depth Estimation with the 3D LiDAR and Stereo Fusion , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22] Sertac Karaman,et al. Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] François Fleuret,et al. Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching , 2018, NeurIPS.

[26] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.