Deep RGB-D Canonical Correlation Analysis For Sparse Depth Completion

In this paper, we propose our Correlation For Completion Network (CFCNet), an end-to-end deep learning model that uses the correlation between two data sources to perform sparse depth completion. CFCNet learns to capture, to the largest extent, the semantically correlated features between RGB and depth information. Through pairs of image pixels and the visible measurements in a sparse depth map, CFCNet facilitates feature-level mutual transformation of different data sources. Such a transformation enables CFCNet to predict features and reconstruct data of missing depth measurements according to their corresponding, transformed RGB features. We extend canonical correlation analysis to a 2D domain and formulate it as one of our training objectives (i.e. 2d deep canonical correlation, or "2D2CCA loss"). Extensive experiments validate the ability and flexibility of our CFCNet compared to the state-of-the-art methods on both indoor and outdoor scenes with different real-life sparse patterns. Codes are available at: this https URL.

[1]  Vijay Kumar,et al.  DFineNet: Ego-Motion Estimation and Depth Refinement from Sparse, Noisy Depth Input with RGB Guidance , 2019, ArXiv.

[2]  Jan Kautz,et al.  Learning Affinity via Spatial Propagation Networks , 2017, NIPS.

[3]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[4]  Vaibhava Goel,et al.  Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[7]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Camillo J. Taylor,et al.  DFuseNet: Deep Fusion of RGB and Sparse Depth Information for Image Guided Dense Depth Completion , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[12]  Dacheng Tao,et al.  A Comprehensive Survey on Pose-Invariant Face Recognition , 2015, ACM Trans. Intell. Syst. Technol..

[13]  Fu-En Wang,et al.  Plug-and-Play: Improve Depth Prediction via Sparse Data Propagation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Krystian Mikolajczyk,et al.  Deep correlation for matching images and text , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yunfei Long,et al.  Depth Coefficients for Depth Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Weifeng Liu,et al.  Canonical correlation analysis networks for two-view image recognition , 2017, Inf. Sci..

[17]  Jeff A. Bilmes,et al.  Unsupervised learning of acoustic features via deep canonical correlation analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Stefano Soatto,et al.  Dense Depth Posterior (DDP) From Single Image and Sparse Range , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[20]  Stefano Mattoccia,et al.  Generative Adversarial Networks for Unsupervised Monocular Depth Prediction , 2018, ECCV Workshops.

[21]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yong Liu,et al.  Parse geometry from a line: Monocular depth estimation with partial laser observation , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Shin'ichi Satoh,et al.  Image sentiment analysis using latent correlations among visual, textual, and sentiment views , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Zhao Chen,et al.  Estimating Depth from RGB and Sparse Sensing , 2018, ECCV.

[26]  K. Upton,et al.  A modern approach , 1995 .

[27]  Subhransu Maji,et al.  Automatic Image Annotation using Deep Learning Representations , 2015, ICMR.

[28]  Seungjin Choi,et al.  Two-Dimensional Canonical Correlation Analysis , 2007, IEEE Signal Processing Letters.

[29]  Iasonas Kokkinos,et al.  Segmentation-Aware Convolutional Networks Using Local Attention Masks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[31]  Baozong Yuan,et al.  A novel statistical linear discriminant analysis for image matrix: two-dimensional Fisherfaces , 2004, Proceedings 7th International Conference on Signal Processing, 2004. Proceedings. ICSP '04. 2004..

[32]  G. Kukharev,et al.  Application of two-dimensional canonical correlation analysis for face image processing and recognition , 2010, Pattern Recognition and Image Analysis.

[33]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[34]  Duygu Ceylan,et al.  DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[35]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[37]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[38]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[39]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[40]  Shiguang Shan,et al.  Multi-view Deep Network for Cross-View Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[42]  Ulrich Neumann,et al.  Depth-aware CNN for RGB-D Segmentation , 2018, ECCV.

[43]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[45]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[47]  C. Zou,et al.  2DCCA: A Novel Method for Small Sample Size Face Recognition , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[48]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Davide Scaramuzza,et al.  REMODE: Probabilistic, monocular dense reconstruction in real time , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Mark J. Warshawsky,et al.  A Modern Approach , 2005 .