Accurate depth-color scene modeling for 3D contents generation with low cost depth cameras

In this paper, we present a depth-color scene modeling strategy for indoors 3D contents generation. It combines depth and visual information provided by a low-cost active depth camera to improve the accuracy of the acquired depth maps considering the different dynamic nature of the scene elements. Accurate depth and color models of the scene background are iteratively built, and used to detect moving elements in the scene. The acquired depth data is continuously processed with an innovative joint-bilateral filter that efficiently combines depth and visual information thanks to the analysis of an edge-uncertainty map and the detected foreground regions. The main advantages of the proposed approach are: removing depth maps spatial noise and temporal random fluctuations; refining depth data at object boundaries, generating iteratively a robust depth and color background model and an accurate moving object silhouette.

[1]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[2]  Richard Szeliski,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[3]  Luis Salgado,et al.  Efficient spatio-temporal hole filling strategy for Kinect depth maps , 2012, Electronic Imaging.

[4]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[5]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[6]  Patrick Benavidez,et al.  Mobile robot navigation and target tracking system , 2011, 2011 6th International Conference on System of Systems Engineering.

[7]  Alon Lerner,et al.  Enhanced interactive gaming by blending full-body tracking and gesture animation , 2010, SA '10.

[8]  Dmitriy Vatolin,et al.  Temporal filtering for depth maps generated by Kinect depth camera , 2011, 2011 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[9]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[10]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[11]  Jakob Wasza,et al.  Real-time preprocessing for dense 3-D range imaging on the GPU: Defect interpolation, bilateral temporal averaging and guided filtering , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[12]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..