A Framework for Fast Low-Power Multi-sensor 3D Scene Capture and Reconstruction

We present a computational framework, which combines depth and colour (texture) modalities for 3D scene reconstruction. The scene depth is captured by a low-power photon mixture device (PMD) employing the time-of-flight principle while the colour (2D) data is captured by a high-resolution RGB sensor. Such 3D capture setting is instrumental in 3D face recognition tasks and more specifically in depth-guided image segmentation, 3D face reconstruction, pose modification and normalization, which are important pre-processing steps prior to feature extraction and recognition. The two captured modalities come with different spatial resolution and need to be aligned and fused so to form what is known as view-plus-depth or RGB-Z 3D scene representation. We discuss specifically the low-power operation mode of the system, where the depth data appears very noisy and needs to be effectively denoised before fusing with colour data. We propose using a modification of the non-local means (NLM) denoising approach, which in our framework operates on complex-valued data thus providing certain robustness against low-light capture conditions and adaptivity to the scene content. Further in our approach, we implement a bilateral filter on the range point-cloud data, ensuring very good starting point for the data fusion step. The latter is based on the iterative Richardson method, which is applied for efficient non-uniform to uniform resampling of the depth data using structural information from the colour data. We demonstrate a real-time implementation of the framework based on GPU, which yields a high-quality 3D scene reconstruction suitable for face normalization and recognition.

[1]  Miska M. Hannuksela,et al.  Joint de-noising and fusion of 2D video and depth map sequences sensed by low-powered tof range sensor , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[2]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[3]  Juho Kannala,et al.  Joint Depth and Color Camera Calibration with Distortion Correction , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Dani Lischinski,et al.  Joint bilateral upsampling , 2007, ACM Trans. Graph..

[5]  Reinhard Koch,et al.  Time‐of‐Flight Cameras in Computer Graphics , 2010, Comput. Graph. Forum.

[6]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Miska M. Hannuksela,et al.  De-noising of distance maps sensed by time-of-flight devices in poor sensing environment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Young Min Kim,et al.  Design and calibration of a multi-view TOF sensor fusion system , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Zhengyou Zhang,et al.  Calibration between depth and color sensors for commodity depth cameras , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[10]  Ig-Jae Kim,et al.  3D Multi-Spectrum Sensor System with Face Recognition , 2013, Sensors.

[11]  Miska M. Hannuksela,et al.  Real-time denoising of ToF measurements by spatio-temporal non-local mean filtering , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[12]  Thea Radüntz,et al.  Study on three-dimensional face recognition with continuous-wave time-of-flight range cameras , 2011 .

[13]  K. Hartmann,et al.  Data-Fusion of PMD-Based Distance-Information and High-Resolution RGB-Images , 2007, 2007 International Symposium on Signals, Circuits and Systems.

[14]  F. Schimbinschi,et al.  4D unconstrained real-time face recognition using a commodity depth camera , 2012, 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[15]  Atanas P. Gotchev,et al.  A speed-optimized RGB-Z capture system with improved denoising capabilities , 2014, Electronic Imaging.

[16]  Michael Werman,et al.  Fusing Time-of-Flight Depth and Color for Real-Time Segmentation and Tracking , 2009, Dyn3D.

[17]  Jürgen Valldorf,et al.  Fast Fusion of Range and Video Sensor Data , 2007 .

[18]  François Michaud,et al.  Relative Motion Threshold for Rejection in ICP Registration , 2009, FSR.

[19]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[20]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[22]  Lijun Yin,et al.  Analyzing Facial Expressions Using Intensity-Variant 3D Data For Human Computer Interaction , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[23]  Atanas Gotchev,et al.  A fast image segmentation algorithm using color and depth map , 2011, 2011 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[24]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[25]  Michael G. Strintzis,et al.  Bilinear Models for 3-D Face and Facial Expression Recognition , 2008, IEEE Transactions on Information Forensics and Security.

[26]  Atanas Gotchev,et al.  NON-UNIFORM TO UNIFORM IMAGE RESAMPLING UTILIZING A 2 D FARROW STRUCTURE , 2007 .

[27]  Rahul Nair,et al.  Denoising Strategies for Time-of-Flight Data , 2013, Time-of-Flight and Depth Imaging.

[28]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[29]  Sebastian Thrun,et al.  A Noise‐aware Filter for Real‐time Depth Upsampling , 2008 .

[30]  José-Raúl Ruiz-Sarmiento,et al.  Improving Human Face Detection through TOF Cameras for Ambient Intelligence Applications , 2011, ISAmI.

[31]  L. Van Gool,et al.  Combining RGB and ToF cameras for real-time 3D hand gesture interaction , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[32]  Karen O. Egiazarian,et al.  Optimized visualization of stereo images on an OMAP platform with integrated parallax barrier auto-stereoscopic display , 2009, 2009 17th European Signal Processing Conference.

[33]  Fred A. Hamprecht,et al.  Denoising of continuous-wave time-of-flight depth images using confidence measures , 2009, Optical Engineering.