Gated2Gated: Self-Supervised Depth Estimation from Gated Images

Gated cameras hold promise as an alternative to scanning LiDAR sensors with high-resolution 3D depth that is robust to back-scatter in fog, snow, and rain. Instead of sequentially scanning a scene and directly recording depth via the photon time-of-flight, as in pulsed LiDAR sensors, gated imagers encode depth in the relative intensity of a handful of gated slices, captured at megapixel resolution. Although existing methods have shown that it is possible to decode high-resolution depth from such measurements, these methods require synchronized and calibrated LiDAR to supervise the gated depth decoder – prohibiting fast adoption across geographies, training on large unpaired datasets, and exploring alternative applications outside of automotive use cases. In this work, we fill this gap and propose an entirely self-supervised depth estimation method that uses gated intensity profiles and temporal consistency as a training signal. The proposed model is trained end-to-end from gated video sequences, does not require LiDAR or RGB data, and learns to estimate absolute depth values. We take gated slices as input and disentangle the estimation of the scene albedo, depth, and ambient light, which are then used to learn to reconstruct the input slices through a cyclic loss. We rely on temporal consistency between a given frame and neighboring gated slices to estimate depth in regions with shadows and reflections. We experimentally validate that the proposed approach outperforms existing supervised and self-supervised depth estimation methods based on monocular RGB and stereo images, as well as supervised methods based on gated images. Code is available at https://github.com/princeton-computationalimaging/Gated2Gated.

[1]  Klaus C. J. Dietmayer,et al.  Gated2Depth: Real-Time Dense Lidar From Gated Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3]  William T. Freeman,et al.  Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[7]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Brent Schwarz,et al.  LIDAR: Mapping the world in 3D , 2010 .

[9]  Sebastian Nowozin,et al.  Dynamic Time-of-Flight , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[11]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[12]  Alan L. Yuille,et al.  Rethinking Monocular Depth Estimation with Adversarial Training , 2018, ArXiv.

[13]  Wei Xu,et al.  Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Werner Ritter,et al.  Benchmarking Image Sensors Under Adverse Weather Conditions for Autonomous Driving , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[16]  Martin Laurenzis,et al.  Long-range three-dimensional active imaging with superresolution depth mapping. , 2007, Optics letters.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Sebastian Nowozin,et al.  Bayesian Time-of-Flight for Realtime Shape, Illumination and Albedo , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[21]  J. Busck,et al.  Gated viewing and high-accuracy three-dimensional laser radar. , 2004, Applied optics.

[22]  Reinhard Koch,et al.  Time‐of‐Flight Cameras in Computer Graphics , 2010, Comput. Graph. Forum.

[23]  Werner Ritter,et al.  A Benchmark for Lidar Sensors in Fog: Is Detection Breaking Down? , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[24]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  P. Andersson Long-range three-dimensional imaging using range-gated laser radar images , 2006 .

[26]  Zhengqi Li,et al.  MannequinChallenge: Learning the Depths of Moving People by Watching Frozen People , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Gordon Wetzstein,et al.  Doppler time-of-flight imaging , 2015, ACM Trans. Graph..

[28]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Felix Heide,et al.  Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[31]  Martin Laurenzis,et al.  Three-dimensional range-gated imaging at infrared wavelengths with super-resolution depth mapping , 2009, Defense + Commercial Sensing.

[32]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Stephen Phillips,et al.  De-noising of Lidar Point Clouds Corrupted by Snowfall , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[36]  Jens Busck,et al.  Underwater 3-D optical imaging with a gated viewing laser radar , 2005 .

[37]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[38]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Felix Heide,et al.  Pixel-Accurate Depth Evaluation in Realistic Driving Scenarios , 2019, 2019 International Conference on 3D Vision (3DV).

[40]  Robert Lange,et al.  3D time-of-flight distance measurement with custom solid-state image sensors in CMOS/CCD-technology , 2006 .

[41]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Matti Kutila,et al.  Testing and Validation of Automotive Point-Cloud Sensors in Adverse Weather Conditions , 2019, Applied Sciences.

[43]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Richard Szeliski,et al.  High-accuracy stereo depth maps using structured light , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[45]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[46]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[47]  Wang Xinwei,et al.  Triangular-range-intensity profile spatial-correlation method for 3D super-resolution range-gated imaging. , 2013, Applied optics.

[48]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Felix Heide,et al.  Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Shinpei Kato,et al.  LIBRE: The Multiple 3D LiDAR Dataset , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[52]  Klaus C. J. Dietmayer,et al.  Learning Super-resolved Depth from Active Gated Imaging , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[53]  Yoav Grauer,et al.  Active gated imaging in driver assistance system , 2014 .

[54]  Luc Van Gool,et al.  Self-supervised Object Motion and Depth Estimation from Video , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).