Efficient spatio-temporal hole filling strategy for Kinect depth maps

In this paper we present an efficient hole filling strategy that improves the quality of the depth maps obtained with the Microsoft Kinect device. The proposed approach is based on a joint-bilateral filtering framework that includes spatial and temporal information. The missing depth values are obtained applying iteratively a joint-bilateral filter to their neighbor pixels. The filter weights are selected considering three different factors: visual data, depth information and a temporal-consistency map. Video and depth data are combined to improve depth map quality in presence of edges and homogeneous regions. Finally, the temporal-consistency map is generated in order to track the reliability of the depth measurements near the hole regions. The obtained depth values are included iteratively in the filtering process of the successive frames and the accuracy of the hole regions depth values increases while new samples are acquired and filtered.

[1]  Frederik Zilly,et al.  Adaptive cross-trilateral depth map filtering , 2010, 2010 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video.

[2]  Dieter Fox,et al.  Interactive 3D modeling of indoor environments with a consumer depth camera , 2011, UbiComp '11.

[3]  Kin Fun Li,et al.  A Web-Based Sign Language Translator Using 3D Video Processing , 2011, 2011 14th International Conference on Network-Based Information Systems.

[4]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[5]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[6]  Albert A. Rizzo,et al.  FAAST: The Flexible Action and Articulated Skeleton Toolkit , 2011, 2011 IEEE Virtual Reality Conference.

[7]  Andrew D. Wilson Using a depth camera as a touch sensor , 2010, ITS '10.

[8]  Mario Ciampi,et al.  Controller-free exploration of medical image data: Experiencing the Kinect , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[9]  Patrick Benavidez,et al.  Mobile robot navigation and target tracking system , 2011, 2011 6th International Conference on System of Systems Engineering.

[10]  Fabio Menna,et al.  Geometric investigation of a gaming active device , 2011, Optical Metrology.

[11]  Pau Gargallo,et al.  Stereoscopic Image Inpainting: Distinct Depth Maps and Images Inpainting , 2010, 2010 20th International Conference on Pattern Recognition.

[12]  Dmitriy Vatolin,et al.  Temporal filtering for depth maps generated by Kinect depth camera , 2011, 2011 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[13]  Hrvoje Benko,et al.  Combining multiple depth cameras and projectors for interactions on, above and between surfaces , 2010, UIST.

[14]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[15]  Dong Tian,et al.  Depth map processing with iterative joint multilateral filtering , 2010, 28th Picture Coding Symposium.

[16]  Miriam Vollenbroek-Hutten,et al.  Chronic pain rehabilitation with a serious game using multimodal input , 2011, 2011 International Conference on Virtual Rehabilitation.

[17]  Yi Deng,et al.  A symmetric patch-based correspondence model for occlusion handling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Michael F. Cohen,et al.  Digital photography with flash and no-flash image pairs , 2004, ACM Trans. Graph..

[20]  Luc Van Gool,et al.  Real-time 3D hand gesture interaction with a robot for understanding directions from humans , 2011, 2011 RO-MAN.

[21]  Hideo Saito,et al.  A Novel Inpainting-Based Layered Depth Video for 3DTV , 2011, IEEE Transactions on Broadcasting.

[22]  Luis Salgado,et al.  Adaptive spatio-temporal filter for low-cost camera depth maps , 2012, 2012 IEEE International Conference on Emerging Signal Processing Applications.

[23]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[24]  Ruigang Yang,et al.  Stereoscopic inpainting: Joint color and depth completion from stereo images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, SIGGRAPH 2007.