R3SGM: Real-Time Raster-Respecting Semi-Global Matching for Power-Constrained Systems

Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.

[1]  Margrit Gelautz,et al.  Secrets of adaptive support weight techniques for local stereo matching , 2013, Comput. Vis. Image Underst..

[2]  Nikos Komodakis,et al.  Fast, Approximately Optimal Solutions for Single and Dynamic MRFs , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Peter I. Corke,et al.  Quantitative Evaluation of Matching Methods and Validity Measures for Stereo Vision , 2001, Int. J. Robotics Res..

[4]  Olaf Kähler,et al.  InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure , 2017, ArXiv.

[5]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Stefania Perri,et al.  An efficient hardware-oriented stereo matching algorithm , 2016, Microprocess. Microsystems.

[8]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[9]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Stefano Mattoccia,et al.  A passive RGBD sensor for accurate and real-time depth sensing self-contained into an FPGA , 2015, ICDSC.

[11]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Antonio M. López,et al.  Embedded Real-time Stereo Estimation via Semi-Global Matching on the GPU , 2016, ICCS.

[13]  Andrew W. Fitzgibbon,et al.  PMBP: PatchMatch Belief Propagation for Correspondence Field Estimation , 2014, International Journal of Computer Vision.

[14]  Marc Pollefeys,et al.  Real-time and low latency embedded computer vision hardware based on a combination of FPGA and mobile CPU , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Philip H.S. Torr,et al.  Real-Time Dense Stereo Matching With ELAS on FPGA-Accelerated Embedded Devices , 2018, IEEE Robotics and Automation Letters.

[16]  Yu Wang,et al.  Real-Time High-Quality Stereo Vision System in FPGA , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Xu Chen,et al.  Hardware Acceleration for Accurate Stereo Vision System using Mini-Census Adaptive Support Region , 2013 .

[18]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Vladlen Koltun,et al.  Fast MRF Optimization with Application to Depth Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Shai Avidan,et al.  Semi-Global Matching: A Principled Derivation in Terms of Message Passing , 2014, GCPR.

[21]  Peter Pirsch,et al.  Real-time stereo vision system using semi-global matching disparity estimation: Architecture and FPGA-implementation , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[22]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[23]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Stefano Mattoccia,et al.  Linear stereo matching , 2011, 2011 International Conference on Computer Vision.

[25]  Philip H. S. Torr,et al.  Real-time depth processing for embedded platforms , 2017, Commercial + Scientific Sensing and Imaging.

[26]  Stefania Perri,et al.  Stereo vision architecture for heterogeneous systems-on-chip , 2018, Journal of Real-Time Image Processing.

[27]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Victor S. Lempitsky,et al.  End-to-End learning of cost-volume aggregation for real-time dense stereo , 2016, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[29]  Diederik Verkest,et al.  Real-time high-definition stereo matching on FPGA , 2011, FPGA '11.

[30]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[31]  Madaín Pérez Patricio,et al.  An FPGA stereo matching unit based on fuzzy logic , 2016, Microprocess. Microsystems.

[32]  Kai Huang,et al.  SoC and FPGA oriented high-quality stereo vision system , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[33]  H. Hirschmüller Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Stereo Processing by Semi-global Matching and Mutual Information , 2022 .

[34]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Benno Stabernack,et al.  Hardware implementation of a full HD real-time disparity estimation algorithm , 2014, IEEE Transactions on Consumer Electronics.

[36]  Heiko Hirschmüller,et al.  Semi-Global Matching-Motivation, Developments and Applications , 2011 .

[37]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Christian Heipke,et al.  Joint 3d Estimation of Vehicles and Scene Flow , 2015 .

[39]  Heiko Hirschmüller,et al.  Evaluation of Stereo Matching Costs on Images with Radiometric Differences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Enric Meinhardt,et al.  MGM: A Significantly More Global Matching for Stereovision , 2015, BMVC.

[41]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[43]  Philip H. S. Torr,et al.  Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[44]  Ioannis Andreadis,et al.  A real-time fuzzy hardware structure for disparity map computation , 2011, Journal of Real-Time Image Processing.

[45]  Heiko Hirschmüller,et al.  Stereo vision and IMU based real-time ego-motion and depth image computation on a handheld device , 2013, 2013 IEEE International Conference on Robotics and Automation.

[46]  Stefan K. Gehrig,et al.  A Real-Time Low-Power Stereo Vision Engine Using Semi-Global Matching , 2009, ICVS.

[47]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[48]  Susan M. Downes,et al.  A Depth-Based Head-Mounted Visual Display to Aid Navigation in Partially Sighted Individuals , 2013, PloS one.

[49]  Madaín Pérez Patricio,et al.  FPGA implementation of an efficient similarity-based adaptive window algorithm for real-time stereo matching , 2015, Journal of Real-Time Image Processing.

[50]  Marc Pollefeys,et al.  Reactive avoidance using embedded stereo vision for MAV flight , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Theocharis Theocharides,et al.  Towards accurate hardware stereo correspondence: A real-time FPGA implementation of a segmentation-based adaptive support weight algorithm , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[52]  Dieter Fox,et al.  RGB-D Object Recognition: Features, Algorithms, and a Large Scale Benchmark , 2013, Consumer Depth Cameras for Computer Vision.