Deep Learning Stereo Vision at the edge

We present an overview of the methodology used to build a new stereo vision solution that is suitable for System on Chip. This new solution was developed to bring computer vision capability to embedded devices that live in a power constrained environment. The solution is constructured as a hybrid between classical Stereo Vision techniques and deep learning approaches. The stereoscopic module is composed of two separate modules: one that accelerates the neural network we trained and one that accelerates the front-end part. The system is completely passive and does not require any structured light to obtain very compelling accuracy. With respect to the previous Stereo Vision solutions offered by the industries we offer a major improvement is robustness to noise. This is mainly possible due to the deep learning part of the chosen architecture. We submitted our result to Middlebury dataset challenge. It currently ranks as the best System on Chip solution. The system has been developed for low latency applications which require better than real time performance on high definition videos.

[1]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[2]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[3]  Shahram Izadi,et al.  UltraStereo: Efficient Learning-Based Matching for Active Stereo Systems , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[5]  Ninghui Sun,et al.  DianNao family , 2016, Commun. ACM.

[6]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ramin Zabih,et al.  Non-parametric Local Transforms for Computing Visual Correspondence , 1994, ECCV.

[8]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[9]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[10]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).