RGB-D sensor data correction and enhancement by introduction of an additional RGB view

RGB-D sensors are becoming more and more vital to robotics. Sensors such as the Microsoft Kinect and time of flight cameras provide 3D colored point-clouds in real time can play a crucial role in Robot Vision. However these sensors suffer from precision deficiencies, and often the density of the point-clouds they provide is insufficient. In this paper, we present a multi-camera system for correction and enhancement of the data acquired from an RGB-D sensor. Our system consists of two sensors, the RGB-D sensor (main sensor) and a regular RGB camera (auxiliary sensor). We perform the correction and the enhancement of the data acquired from the RGB-D sensor by placing the auxiliary sensor in a close proximity to the target object and taking advantage of the established epipolar geometry. We have managed to reduce the relative error of the raw point-cloud from a Microsoft Kinect RGB-D sensor by 74.5 % and increase its density up to 2.5 times.

[1]  Dmitriy Vatolin,et al.  Temporal filtering for depth maps generated by Kinect depth camera , 2011, 2011 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[2]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[3]  Ke Xu,et al.  RGB-D fusion toward accurate 3D mapping , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[4]  Kourosh Khoshelham,et al.  Accuracy analysis of kinect depth data , 2012 .

[5]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[6]  Nassir Navab,et al.  Human skeleton tracking from depth data using geodesic distances and optical flow , 2012, Image Vis. Comput..

[7]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[8]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  David Molyneaux KinectFusion rapid 3D reconstruction and interaction with Microsoft Kinect , 2012, FDG.

[10]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[11]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[12]  Young Min Kim,et al.  Multi-view image and ToF sensor fusion for dense 3D reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[13]  Andrew Y. Ng,et al.  Integrating Visual and Range Data for Robotic Object Detection , 2008, ECCV 2008.

[14]  Nassir Navab,et al.  Estimating human 3D pose from Time-of-Flight images based on geodesic distances and optical flow , 2011, Face and Gesture 2011.

[15]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[16]  Sven Behnke,et al.  Real-Time Plane Segmentation Using RGB-D Cameras , 2012, RoboCup.

[17]  Mario Fritz,et al.  Improving the Kinect by Cross-Modal Stereo , 2011, BMVC.

[18]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[19]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.