A 4.29nJ/pixel Stereo Depth Coprocessor With Pixel Level Pipeline and Region Optimized Semi-Global Matching for IoT Application

The semi-global matching (SGM) algorithm in stereo vision is a well-known depth-estimation method since it can generate dense and robust disparity maps. However, the real-time processing and low power dissipation, the specifications of the Internet-of-Thing (IoT) applications, are challenging for their computational complexity. In this paper, we propose a hardware-oriented SGM algorithm with pixel-level pipeline and region-optimized cost aggregation for high-speed processing and low hardware-resource usage. Firstly, the matching costs in a region are integrated with an optimization strategy to significantly reduce memory usage and improve the processing speed of the cost aggregation. Then, a two-layer parallel two-stage pipeline (TPTP) architecture, which enables pixel-level processing, is designed to calculate two directions (0° and 135°) aggregation to further solve the crucial computational bottleneck of the SGM algorithm. Finally, the architecture is demonstrated on a low-cost XILINX Spartan-7 device and an advanced Stratix-V FPGA device for VGA ( $640\times 480$ ) depth estimation. The experimental results show that the proposed architecture with compact hardware architecture also ensures accuracy. The pixel-level pipeline architecture enables a processing speed of 355 frames per second (fps) at 109MHz on the Spartan-7 FPGA device and 508 fps at 156MHz on the Stratix-V FPGA. Besides, the coprocessor respectively achieves an energy efficiency of 4.74 nJ/pixel with a power dissipation of 517mW and 4.29nJ/pixel with a power dissipation of 669mW on these two FPGAs.