Fusion of Kinect depth data with trifocal disparity estimation for near real-time high quality depth maps generation

Generating depth maps along with video streams is valuable for Cinema and Television production. Thanks to the improvements of depth acquisition systems, the challenge of fusion between depth sensing and disparity estimation is widely investigated in computer vision. This paper presents a new framework for generating depth maps from a rig made of a professional camera with two satellite cameras and a Kinect device. A new disparity-based calibration method is proposed so that registered Kinect depth samples become perfectly consistent with disparities estimated between rectified views. Also, a new hierarchical fusion approach is proposed for combining on the flow depth sensing and disparity estimation in order to circumvent their respective weaknesses. Depth is determined by minimizing a global energy criterion that takes into account the matching reliability and the consistency with the Kinect input. Thus generated depth maps are relevant both in uniform and textured areas, without holes due to occlusions or structured light shadows. Our GPU implementation reaches 20fps for generating quarter-pel accurate HD720p depth maps along with main view, which is close to real-time performances for video applications. The estimated depth is high quality and suitable for 3D reconstruction or virtual view synthesis.

[1]  Mario Fritz,et al.  Improving the Kinect by Cross-Modal Stereo , 2011, BMVC.

[2]  Yucheng Wang,et al.  A Fusion Framework of Stereo Vision and Kinect for High-Quality Dense Depth Maps , 2012, ACCV Workshops.

[3]  Gary R. Bradski,et al.  Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library , 2016 .

[4]  John G. Apostolopoulos,et al.  Fusion of active and passive sensors for fast 3D capture , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[5]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[6]  Daniel Herrera C,et al.  Joint depth and color camera calibration with distortion correction. , 2012, IEEE transactions on pattern analysis and machine intelligence.

[7]  Neus Sabater,et al.  A precise real-time stereo algorithm , 2012, IVCNZ '12.

[8]  Zhengyou Zhang,et al.  Calibration between depth and color sensors for commodity depth cameras , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[9]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  Young Min Kim,et al.  Multi-view image and ToF sensor fusion for dense 3D reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[11]  Joachim Weickert,et al.  Universität Des Saarlandes Fachrichtung 6.1 – Mathematik Optic Flow in Harmony Optic Flow in Harmony Optic Flow in Harmony , 2022 .

[12]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[13]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.