CoMoDA: Continuous Monocular Depth Adaptation Using Past Experiences

While ground truth depth data remains hard to obtain, self-supervised monocular depth estimation methods enjoy growing attention. Much research in this area aims at improving loss functions or network architectures. Most works, however, do not leverage self-supervision to its full potential. They stick to the standard closed world train-test pipeline, assuming the network parameters to be fixed after the training is finished. Such an assumption does not allow to adapt to new scenes, whereas with self-supervision this becomes possible without extra annotations.In this paper, we propose a novel self-supervised Continuous Monocular Depth Adaptation method (CoMoDA), which adapts the pretrained model on a test video on the fly. As opposed to existing test-time refinement methods that use isolated frame triplets, we opt for continuous adaptation, making use of the previous experience from the same scene. We additionally augment the proposed procedure with the experience from the distant past, preventing the model from overfitting and thus forgetting already learnt information.We demonstrate that our method can be used for both intra- and cross-dataset adaptation. By adapting the model from train to test set of the Eigen split of KITTI, we achieve state-of-the-art depth estimation performance and surpass all existing methods using standard architectures. We also show that our method runs 15 times faster than existing test-time refinement methods. The code is available at https://github.com/Yevkuzn/CoMoDA.

[1]  Toby P. Breckon,et al.  Real-Time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[3]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Xiaogang Wang,et al.  Learning Monocular Depth by Distilling Cross-domain Stereo Networks , 2018, ECCV.

[7]  R. Venkatesh Babu,et al.  AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Wei Xu,et al.  Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nicu Sebe,et al.  Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[12]  Luigi di Stefano,et al.  Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[14]  Anelia Angelova,et al.  Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Rares Ambrus,et al.  3D Packing for Self-Supervised Monocular Depth Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Liang-Gee Chen,et al.  What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Luigi di Stefano,et al.  Learning to Adapt for Stereo , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ali Farhadi,et al.  Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks , 2016, ECCV.

[22]  Stefano Soatto,et al.  Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Dacheng Tao,et al.  Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Dongbing Gu,et al.  UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Hongbin Zha,et al.  Self-Supervised Deep Visual Odometry With Online Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Wei Xu,et al.  LEGO: Learning Edge with Geometry all at Once by Watching Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[32]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Chunhua Shen,et al.  Enforcing Geometric Constraints of Virtual Normal for Depth Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Ying Li,et al.  Exploiting Temporal Consistency for Real-Time Video Depth Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[38]  Nicu Sebe,et al.  Online Depth Learning Against Forgetting in Monocular Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Hongdong Li,et al.  Open-World Stereo Video Matching with Deep RNN , 2018, ECCV.

[40]  Anelia Angelova,et al.  Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos , 2018, AAAI.

[41]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[42]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Wei Xu,et al.  Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency , 2017, AAAI.

[44]  Nicu Sebe,et al.  Refine and Distill: Exploiting Cycle-Inconsistency and Knowledge Distillation for Unsupervised Monocular Depth Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[46]  Dengxin Dai,et al.  Don’t Forget The Past: Recurrent Depth Estimation from Monocular Video , 2020, IEEE Robotics and Automation Letters.

[47]  Jian Yang,et al.  Online Adaptation through Meta-Learning for Stereo Depth Estimation , 2019, ArXiv.

[48]  Nicu Sebe,et al.  Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks , 2018, 2018 International Conference on 3D Vision (3DV).

[49]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[50]  Stefano Mattoccia,et al.  Generative Adversarial Networks for Unsupervised Monocular Depth Prediction , 2018, ECCV Workshops.

[51]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[52]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.