ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems

In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1 / 30th of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.

[1]  Pushmeet Kohli,et al.  The Global Patch Collider , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jan Kautz,et al.  Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.

[3]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[4]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Giorgio Metta,et al.  Keep it simple and sparse: real-time action recognition , 2013, J. Mach. Learn. Res..

[8]  In-So Kweon,et al.  Adaptive Support-Weight Approach for Correspondence Search , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Giorgio Metta,et al.  One-Shot Learning for Real-Time Action Recognition , 2013, IbPRIA.

[10]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[11]  David Kim,et al.  Articulated distance fields for ultra-fast tracking of hands interacting , 2017, ACM Trans. Graph..

[12]  Haidi Ibrahim,et al.  Literature Survey on Stereo Vision Disparity Map Algorithms , 2016, J. Sensors.

[13]  Sergio Orts,et al.  HyperDepth: Learning Depth from Structured Light without Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ramesh Raskar,et al.  Resolving Multi-path Interference in Time-of-Flight Imaging via Modulation Frequency Diversity and Sparse Regularization , 2014, Optics letters.

[17]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[19]  Eric Psota,et al.  Real-Time Stereo Matching on CUDA Using an Iterative Refinement Method for Adaptive Support-Weight Correspondences , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[21]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[22]  David Sweeney,et al.  Learning to be a depth camera for close-range human capture and interaction , 2014, ACM Trans. Graph..

[23]  Shahram Izadi,et al.  SOS: Stereo Matching in O(1) with Slanted Support Windows , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[26]  H. K. Nishihara,et al.  PRISM: A Practical Mealtime Imaging Stereo Matcher , 1984, Other Conferences.

[27]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.

[28]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[30]  Andrew W. Fitzgibbon,et al.  PMBP: PatchMatch Belief Propagation for Correspondence Field Estimation , 2014, International Journal of Computer Vision.

[31]  Margrit Gelautz,et al.  Simple but Effective Tree Structures for Dynamic Programming-Based Stereo Matching , 2008, VISAPP.

[32]  Wei Chen,et al.  Learning Deep Correspondence through Prior and Posterior Feature Constancy , 2017, ArXiv.

[33]  Hongdong Li,et al.  Self-Supervised Learning for Stereo Matching with Self-Improving Ability , 2017, ArXiv.

[34]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[36]  Shahram Izadi,et al.  UltraStereo: Efficient Learning-Based Matching for Active Stereo Systems , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ramesh Raskar,et al.  A light transport model for mitigating multipath interference in Time-of-flight sensors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[39]  Steve Marschner,et al.  Matching Real Fabrics with Micro-Appearance Models , 2015, ACM Trans. Graph..

[40]  Andrew W. Fitzgibbon,et al.  Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences , 2016, ACM Trans. Graph..

[41]  Shahram Izadi,et al.  Low Compute and Fully Parallel Computer Vision with HashMatch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Lior Wolf,et al.  Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Anders Grunnet-Jepsen,et al.  Intel(R) RealSense(TM) Stereoscopic Depth Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[45]  Ramesh Raskar,et al.  Resolving Multipath Interference in Kinect: An Inverse Problem Approach , 2014, IEEE Sensors Journal.

[46]  Ramesh Raskar,et al.  A Light Transport Model for Mitigating Multipath Interference in TOF Sensors , 2015, ArXiv.

[47]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[48]  Shai Shalev-Shwartz,et al.  On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[49]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Edge-Aware Depth Prediction , 2018 .

[50]  Anders Grunnet-Jepsen,et al.  Intel RealSense Stereoscopic Depth Cameras , 2017, CVPR 2017.

[51]  Andreas Klaus,et al.  Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[52]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[53]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Shahram Izadi,et al.  Motion2fusion , 2017, ACM Trans. Graph..

[56]  Karen O. Egiazarian,et al.  Practical Poissonian-Gaussian Noise Modeling and Fitting for Single-Image Raw-Data , 2008, IEEE Transactions on Image Processing.

[57]  Nikos Komodakis,et al.  Detect, Replace, Refine: Deep Structured Prediction for Pixel Wise Labeling , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Kuk-Jin Yoon,et al.  Locally adaptive support-weight approach for visual correspondence search , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[60]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[61]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..