An Integrated Platform for Live 3D Human Reconstruction and Motion Capturing

The latest developments in 3D capturing, processing, and rendering provide means to unlock novel 3D application pathways. The main elements of an integrated platform, which target tele-immersion and future 3D applications, are described in this paper, addressing the tasks of real-time capturing, robust 3D human shape/appearance reconstruction, and skeleton-based motion tracking. More specifically, initially, the details of a multiple RGB-depth (RGB-D) capturing system are given, along with a novel sensors’ calibration method. A robust, fast reconstruction method from multiple RGB-D streams is then proposed, based on an enhanced variation of the volumetric Fourier transform-based method, parallelized on the Graphics Processing Unit, and accompanied with an appropriate texture-mapping algorithm. On top of that, given the lack of relevant objective evaluation methods, a novel framework is proposed for the quantitative evaluation of real-time 3D reconstruction systems. Finally, a generic, multiple depth stream-based method for accurate real-time human skeleton tracking is proposed. Detailed experimental results with multi-Kinect2 data sets verify the validity of our arguments and the effectiveness of the proposed system and methodologies.

[1]  Petros Daras,et al.  Real-time, realistic full-body 3D reconstruction and texture mapping from multiple Kinects , 2013, IVMSP 2013.

[2]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[3]  Petros Daras,et al.  Dynamic adaptive mesh streaming for real-time 3D teleimmersion , 2015, Web3D.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Juergen Gall,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[6]  Kun Peng,et al.  Enhanced personal autostereoscopic telepresence system using commodity depth cameras , 2012, Comput. Graph..

[7]  Saïda Bouakaz,et al.  Real-Time and Markerless 3D Human Motion Capture Using Multiple Views , 2007, Workshop on Human Motion.

[8]  Jean Ponce,et al.  Carved Visual Hulls for Image-Based Modeling , 2006, International Journal of Computer Vision.

[9]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[10]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Henry Fuchs,et al.  Real-time volumetric 3D capture of room-sized scenes for telepresence , 2012, 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[12]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[13]  H. Macher,et al.  FIRST EXPERIENCES WITH KINECT V2 SENSOR FOR CLOSE RANGE 3D MODELLING , 2015 .

[14]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[15]  S L Hui,et al.  Bone mass and anthropometric measurements in adult females. , 1990, Bone and mineral.

[16]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[17]  Petros Daras,et al.  Real-Time, Full 3-D Reconstruction of Moving Foreground Objects From Multiple Consumer Depth Cameras , 2013, IEEE Transactions on Multimedia.

[18]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Alan C. Bovik,et al.  Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures , 2009, IEEE Signal Processing Magazine.

[20]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[21]  Michael M. Kazhdan,et al.  Poisson surface reconstruction , 2006, SGP '06.

[22]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[23]  Andreas Kolb,et al.  Kinect range sensing: Structured-light versus Time-of-Flight Kinect , 2015, Comput. Vis. Image Underst..

[24]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.

[25]  Hiroshi Mizoguchi,et al.  3D Point Cloud-Based Virtual Environment for Safe Testing of Robot Control Programs: Measurement Range Expansion through Linking of Multiple Kinect v2 Sensors , 2015, 2015 6th International Conference on Intelligent Systems, Modelling and Simulation.

[26]  Ruzena Bajcsy,et al.  High-Quality Visualization for Geographically Distributed 3-D Teleimmersive Applications , 2011, IEEE Transactions on Multimedia.

[27]  Marek Kowalski,et al.  Livescan3D: A Fast and Inexpensive 3D Data Acquisition System for Multiple Kinect v2 Sensors , 2015, 2015 International Conference on 3D Vision.

[28]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[29]  Philipp Fechteler,et al.  A framework for realistic 3D tele-immersion , 2013, MIRAGE '13.

[30]  Sanjivani Shantaiya,et al.  Multiple Object Tracking using Kalman Filter and Optical Flow , 2015 .

[31]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[32]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[33]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[34]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[35]  Michael M. Kazhdan,et al.  Reconstruction of solid models from oriented point sets , 2005, SGP '05.

[36]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Zhou Wang,et al.  Information Content Weighting for Perceptual Image Quality Assessment , 2011, IEEE Transactions on Image Processing.

[38]  Horst Bischof,et al.  Skeletal Graph Based Human Pose Estimation in Real-Time , 2011, BMVC.

[39]  Junjie Cao,et al.  Point Cloud Skeletons via Laplacian Based Contraction , 2010, 2010 Shape Modeling International Conference.

[40]  Rangasami L. Kashyap,et al.  Building Skeleton Models via 3-D Medial Surface/Axis Thinning Algorithms , 1994, CVGIP Graph. Model. Image Process..

[41]  Edmond Boyer,et al.  Efficient Polyhedral Modeling from Silhouettes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Henry Fuchs,et al.  Encumbrance-free telepresence system with real-time 3D capture and display using commodity depth cameras , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[43]  Petros Daras,et al.  Toward Real-Time and Efficient Compression of Human Time-Varying Meshes , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[44]  Hans-Peter Seidel,et al.  Free-viewpoint video of human actors , 2003, ACM Trans. Graph..