PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation

Online stereo adaptation tackles the domain shift problem, caused by different environments between synthetic (training) and real (test) datasets, to promptly adapt stereo models in dynamic real-world applications such as autonomous driving. However, previous methods often fail to counteract particular regions related to dynamic objects with more severe environmental changes. To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation. In a nutshell, our auxiliary network learns to fix local variants intensively by effectively back-propagating local information through the meta-gradient for the robust initialization of the baseline model. This network is model-agnostic, so can be used in any kind of architectures in a plug-and-play manner. We conduct extensive experiments to verify the effectiveness of our method under three adaptation settings such as short-, mid-, and long-term sequences. Experimental results show that the proper initialization of the base stereo model by the auxiliary network enables our learning paradigm to achieve state-of-the-art performance at inference.

[1]  S. Mattoccia,et al.  On the Confidence of Stereo Matching in a Deep-Learning Era: A Quantitative Evaluation , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Stefano Mattoccia,et al.  Continual Adaptation for Deep Stereo , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Zachary Teed,et al.  RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching , 2021, 2021 International Conference on 3D Vision (3DV).

[4]  Nicholas Roy,et al.  Toward Robust and Efficient Online Adaptation for Deep Stereo Depth Estimation , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Yuchao Dai,et al.  CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  S. Izadi,et al.  HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  L. Gool,et al.  Consistency Guided Scene Flow Estimation , 2020, ECCV.

[8]  Timothy M. Hospedales,et al.  Online Meta-Learning for Multi-Source and Semi-Supervised Domain Adaptation , 2020, ECCV.

[9]  Naila Murray,et al.  Virtual KITTI 2 , 2020, ArXiv.

[10]  Ross B. Girshick,et al.  PointRend: Image Segmentation As Rendering , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ruigang Yang,et al.  Domain-invariant Stereo Matching Networks , 2019, ECCV.

[12]  Jie Song,et al.  Faster Self-adaptive Deep Stereo , 2020, ACCV.

[13]  Zhidong Deng,et al.  DrivingStereo: A Large-Scale Dataset for Stereo Matching in Autonomous Driving Scenarios , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Seungryong Kim,et al.  LAF-Net: Locally Adaptive Fusion Networks for Stereo Confidence Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jian Yang,et al.  Online Adaptation through Meta-Learning for Stereo Depth Estimation , 2019, ArXiv.

[16]  Luigi di Stefano,et al.  Learning to Adapt for Stereo , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Seungryong Kim,et al.  Unified Confidence Estimation Networks for Robust Stereo Matching , 2019, IEEE Transactions on Image Processing.

[18]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[19]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[20]  Yinda Zhang,et al.  ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems , 2018, ECCV.

[21]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Liang Lin,et al.  Zoom and Learn: Generalizing Deep Stereo Matching to Novel Domains , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Hong Zhang,et al.  Unsupervised Learning of Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Luigi di Stefano,et al.  Unsupervised Adaptation for Deep Stereo , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Fabio Maria Carlucci,et al.  Deep Depth Domain Adaptation: A Case Study , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[27]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[29]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luigi di Stefano,et al.  Learning confidence measures in the wild , 2017, BMVC.

[31]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[32]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[34]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[35]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[36]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[38]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[40]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[41]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Andreas Geiger,et al.  Efficient Large-Scale Stereo Matching , 2010, ACCV.

[44]  Gauthier Lafruit,et al.  Cross-Based Local Stereo Matching Using Orthogonal Integral Images , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  G. Gerhart,et al.  Stereo vision and laser odometry for autonomous helicopters in GPS-denied indoor environments , 2009 .

[46]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..