Patch attention network with generative adversarial model for semi-supervised binocular disparity prediction

In this paper, we address the challenging points of binocular disparity estimation: (1) unsatisfactory results in the occluded region when utilizing warping function in unsupervised learning; (2) inefficiency in running time and the number of parameters as adopting a lot of 3D convolutions in the feature matching module. To solve these drawbacks, we propose a patch attention network for semi-supervised stereo matching learning. First, we employ a channel-attention mechanism to aggregate the cost volume by selecting its different surfaces for reducing a large number of 3D convolution, called the patch attention network (PA-Net). Second, we use our proposed PA-Net as a generator and then combine it, traditional unsupervised learning loss, and the adversarial learning model to construct a semi-supervised learning framework for improving performance in the occluded areas. We have trained our PA-Net in supervised learning, semi-supervised learning, and unsupervised learning manners. Extensive experiments show that (1) our semi-supervised learning framework can overcome the drawbacks of unsupervised learning and significantly improve the performance in the ill-posed region by using only a few or inaccurate ground truths; (2) our PA-Net can outperform other state-of-the-art approaches in supervised learning and use fewer parameters.

[1]  Hui Huang,et al.  Learning a convolutional neural network for propagation-based stereo image segmentation , 2018, The Visual Computer.

[2]  Bo Li,et al.  Multi-scale Cross-form Pyramid Network for Stereo Matching , 2019, 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[3]  Trevor Darrell,et al.  Hierarchical Discrete Distribution Decomposition for Match Density Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Raquel Urtasun,et al.  Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation , 2014, ECCV.

[5]  Concetto Spampinato,et al.  Semi Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bo Li,et al.  NLCA-Net: a non-local context attention network for stereo matching , 2020, APSIPA Transactions on Signal and Information Processing.

[9]  Boxin Shi,et al.  A self-calibrated photo-geometric depth camera , 2018, The Visual Computer.

[10]  Liang Tian,et al.  Disparity estimation in stereo video sequence with adaptive spatiotemporally consistent constraints , 2019, The Visual Computer.

[11]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[12]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Ulf Assarsson,et al.  A low-cost, practical acquisition and rendering pipeline for real-time free-viewpoint video communication , 2020, The Visual Computer.

[14]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Marc Pollefeys,et al.  SGM-Nets: Semi-Global Matching with Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bo Li,et al.  MSDC-Net: Multi-Scale Dense and Contextual Networks for Stereo Matching , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[18]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[19]  Bo Li,et al.  Monocular Depth Estimation with Hierarchical Fusion of Dilated CNNs and Soft-Weighted-Sum Inference , 2017, Pattern Recognit..

[20]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Chunhua Shen,et al.  Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Anelia Angelova,et al.  Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos , 2018, AAAI.

[23]  Yuchao Dai,et al.  SDBF-Net: Semantic and Disparity Bidirectional Fusion Network for 3D Semantic Detection on Incidental Satellite Images , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[24]  Luigi di Stefano,et al.  Geometry meets semantics for semi-supervised monocular depth estimation , 2018, ACCV.

[25]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luigi di Stefano,et al.  Real-Time Self-Adaptive Deep Stereo , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Rongrong Ji,et al.  Semi-Supervised Adversarial Monocular Depth Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jianwei Zhang,et al.  An efficient stereo matching based on fragment matching , 2018, The Visual Computer.

[32]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Lior Wolf,et al.  Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Takayuki Okatani,et al.  Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Lili Ju,et al.  Semantic Stereo Matching With Pyramid Cost Volumes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[38]  Xiaogang Wang,et al.  Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Rui Hu,et al.  DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Bo Li,et al.  MVS2: Deep Unsupervised Multi-View Stereo with Multi-View Symmetry , 2019, 2019 International Conference on 3D Vision (3DV).

[42]  Jinglin Zhang,et al.  A simplified ICA-based local similarity stereo matching , 2020, The Visual Computer.

[43]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[44]  Hongdong Li,et al.  Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Guan Huang,et al.  Attention-Guided Unified Network for Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Hongdong Li,et al.  Noise-Aware Unsupervised Deep Lidar-Stereo Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[51]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Nikolai Smolyanskiy,et al.  On the Importance of Stereo for Accurate Depth Estimation: An Efficient Semi-Supervised Deep Neural Network Approach , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).