论文信息 - ASV: Accelerated Stereo Vision System

ASV: Accelerated Stereo Vision System

Estimating depth from stereo vision cameras, i.e., "depth from stereo", is critical to emerging intelligent applications deployed in energy- and performance-constrained devices, such as augmented reality headsets and mobile autonomous robots. While existing stereo vision systems make trade-offs between accuracy, performance and energy-efficiency, we describe ASV, an accelerated stereo vision system that simultaneously improves both performance and energy-efficiency while achieving high accuracy. The key to ASV is to exploit unique characteristics inherent to stereo vision, and apply stereo-specific optimizations, both algorithmically and computationally. We make two contributions. Firstly, we propose a new stereo algorithm, invariant-based stereo matching (ISM), that achieves significant speedup while retaining high accuracy. The algorithm combines classic "hand-crafted" stereo algorithms with recent developments in Deep Neural Networks (DNNs), by leveraging the correspondence invariant unique to stereo vision systems. Secondly, we observe that the bottleneck of the ISM algorithm is the DNN inference, and in particular the deconvolution operations that introduce massive compute-inefficiencies. We propose a set of software optimizations that mitigate these inefficiencies. We show that with less than 0.5% hardware area overhead, these algorithmic and computational optimizations can be effectively integrated within a conventional DNN accelerator. Overall, ASV achieves 5× speedup and 85% energy saving with 0.02% accuracy loss compared to today's DNN-based stereo vision systems.

[1] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Gu-Yeon Wei,et al. DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications , 2018, IEEE Journal of Solid-State Circuits.

[3] Matthew Mattina,et al. Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[4] Gu-Yeon Wei,et al. A 16-nm Always-On DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs , 2019, IEEE Journal of Solid-State Circuits.

[5] Gu-Yeon Wei,et al. A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators , 2019, 2019 Symposium on VLSI Circuits.

[6] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[8] Matthew Mattina,et al. Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective , 2018, ArXiv.

[9] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[10] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] William Pugh,et al. Incremental computation via function caching , 1989, POPL '89.

[12] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.

[13] Zixiang Xiong,et al. 3D scene reconstruction by multiple structured-light based commodity depth cameras , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Vincent Dumoulin,et al. Deconvolution and Checkerboard Artifacts , 2016 .

[15] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Nam Sung Kim,et al. GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[17] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.

[18] Rama Chellappa,et al. Accuracy vs Efficiency Trade-offs in Optical Flow Algorithms , 1996, Comput. Vis. Image Underst..

[19] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.

[20] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[21] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[22] Richard Szeliski,et al. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[23] Radu Horaud,et al. Scene flow estimation by growing correspondence seeds , 2011, CVPR 2011.

[24] Jose-Maria Arnau,et al. Computation Reuse in DNNs by Exploiting Input Similarity , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[25] Mostafa Mahmoud,et al. Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26] Gunnar Farnebäck,et al. Polynomial expansion for orientation and motion estimation , 2002 .

[27] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[28] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..

[29] Christopher W. Fletcher,et al. Morph: Flexible Acceleration for 3D CNN-Based Video Understanding , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30] Richard Szeliski,et al. Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[31] Kurt Keutzer,et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[33] H. Hirschmüller. Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[34] D. Michie. “Memo” Functions and Machine Learning , 1968, Nature.

[35] Bernhard P. Wrobel,et al. Multiple View Geometry in Computer Vision , 2001 .

[36] Sek M. Chai,et al. An Embedded Vision Services Framework for Heterogeneous Accelerators , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[37] Yu Wang,et al. Real-time high-quality stereo vision system in FPGA , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[38] Tao Li,et al. Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-Based Deep Learning , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[39] Xuan Yang,et al. A Systematic Approach to Blocking Convolutional Neural Networks , 2016, ArXiv.

[40] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[41] Christoforos E. Kozyrakis,et al. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory , 2017, ASPLOS.

[42] Darius Burschka,et al. Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[43] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[45] Qi Tian,et al. Algorithms for subpixel registration , 1986 .

[46] Theocharis Theocharides,et al. High-quality real-time hardware stereo matching based on guided image filtering , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[47] Suren Jayasuriya,et al. EVA²: Exploiting Temporal Redundancy in Live Computer Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[48] Patrick Hansen,et al. FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning , 2019, ArXiv.

[49] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[50] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[51] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[52] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[53] Amrita Mazumdar,et al. Exploring computation-communication tradeoffs in camera systems , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[54] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[55] Andreas Geiger,et al. Efficient Large-Scale Stereo Matching , 2010, ACCV.

[56] Federico Tombari,et al. Classification and evaluation of cost aggregation methods for stereo correspondence , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[57] C. Weitkamp. Lidar, Range-Resolved Optical Remote Sensing of the Atmosphere , 2005 .

[58] H.-S. Philip Wong,et al. On-Chip Memory Technology Design Space Explorations for Mobile Deep Neural Network Accelerators , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[59] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[60] M. Jakubowski,et al. Block-based motion estimation algorithms — a survey , 2013 .

[61] Qingxiong Yang,et al. Hardware-Efficient Bilateral Filtering for Stereo Matching , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62] Nikolai Smolyanskiy,et al. On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[63] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[64] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.

[65] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[66] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.

[67] Andreas Geiger,et al. Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).