Deep Learning for Confidence Information in Stereo and ToF Data Fusion

This paper proposes a novel framework for the fusion of depth data produced by a Time-of-Flight (ToF) camera and a stereo vision system. The key problem of balancing between the two sources of information is solved by extracting confidence maps for both sources using deep learning. We introduce a novel synthetic dataset accurately representing the data acquired by the proposed setup and use it to train a Convolutional Neural Network architecture. The machine learning framework estimates the reliability of both data sources at each pixel location. The two depth fields are finally fused enforcing the local consistency of depth data taking into account the confidence information. Experimental results show that the proposed approach increases the accuracy of the depth estimation.

[1]  Ruigang Yang,et al.  Reliability Fusion of Time-of-Flight Depth and Stereo Geometry for High Quality Depth Maps , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pietro Zanuttigh,et al.  A Novel Interpolation Scheme for Range Data with Side Information , 2009, 2009 Conference for Visual Media Production.

[3]  Stefano Mattoccia,et al.  Reliable Fusion of ToF and Stereo Depth Driven by Confidence Measures , 2016, ECCV.

[4]  Rahul Nair,et al.  High Accuracy TOF and Stereo Sensor Fusion at Interactive Rates , 2012, ECCV Workshops.

[5]  Marc Pollefeys,et al.  Patch Based Confidence Prediction for Dense Disparity Map , 2016, BMVC.

[6]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sebastian Schwarz,et al.  Time-of-flight sensor fusion with depth measurement reliability weighting , 2014, 2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[8]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Stefano Mattoccia,et al.  A locally global approach to stereo correspondence , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[10]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11]  Stefano Mattoccia,et al.  Learning to Predict Stereo Reliability Enforcing Local Consistency of Confidence Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Guido M. Cortelazzo,et al.  A Probabilistic Approach to ToF and Stereo Data Fusion , 2010 .

[13]  Ruigang Yang,et al.  Spatial-Temporal Fusion for High Accuracy Depth Maps Using Dynamic MRFs , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Stefano Mattoccia,et al.  Locally Consistent ToF and Stereo Data Fusion , 2012, ECCV Workshops.

[15]  Ludovico Minto,et al.  Time-of-Flight and Structured Light Depth Cameras , 2016, Springer International Publishing.

[16]  Marcus A. Magnor,et al.  A Survey on Time-of-Flight Stereo Fusion , 2013, Time-of-Flight and Depth Imaging.

[17]  Gang Wang,et al.  Fusion of Median and Bilateral Filtering for Range Image Upsampling , 2013, IEEE Transactions on Image Processing.

[18]  Ruigang Yang,et al.  Spatial-Depth Super Resolution for Range Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Radu Horaud,et al.  Time-of-Flight Cameras: Principles, Methods and Applications , 2012 .

[20]  Sebastian Thrun,et al.  Upsampling range data in dynamic environments , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Dah-Jye Lee,et al.  Review of stereo vision algorithms and their suitability for resource-limited systems , 2013, Journal of Real-Time Image Processing.

[22]  A. Frick,et al.  Generation of 3D-TV LDV-content with Time-Of-Flight Camera , 2009, 2009 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[23]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[24]  Rasmus Larsen,et al.  Fusion of stereo vision and Time-Of-Flight imaging for improved 3D estimation , 2008, Int. J. Intell. Syst. Technol. Appl..

[25]  Klaus-Dieter Kuhnert,et al.  Fusion of Stereo-Camera and PMD-Camera Data for Real-Time Suited Precise 3D Environment Reconstruction , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Stefano Mattoccia,et al.  Learning from scratch a confidence measure , 2016, BMVC.

[27]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[28]  Guido M. Cortelazzo,et al.  Probabilistic ToF and Stereo Data Fusion Based on Mixed Pixels Measurement Models , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Rahul Nair,et al.  Simulation of Time-of-Flight Sensors using Global Illumination , 2013, VMV.

[30]  T. Kahlmann,et al.  Calibration and development for increased accuracy of 3D range imaging cameras , 2008 .

[31]  Radu Horaud,et al.  Fusion of Range and Stereo Data for High-Resolution Scene-Modeling , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Fulvio Rinaudo,et al.  SR-4000 and CamCube3.0 Time of Flight (ToF) Cameras: Tests and Comparison , 2012, Remote. Sens..

[34]  John Sell,et al.  The Xbox One System on a Chip and Kinect Sensor , 2014, IEEE Micro.

[35]  Young Min Kim,et al.  Multi-view image and ToF sensor fusion for dense 3D reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[36]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..