Consumer Depth Cameras for Computer Vision

We analyze Kinect as a 3D measuring device, experimentally investigate depth measurement resolution and error properties, and make a quantitative comparison of Kinect accuracy with stereo reconstruction from SLR cameras and a 3DTOF camera. We propose a Kinect geometrical model and its calibration procedure providing an accurate calibration of Kinect 3D measurement and Kinect cameras. We compare our Kinect calibration procedure with its alternatives available on Internet, and integrate it into an SfM pipeline where 3D measurements from a moving Kinect are transformed into a common coordinate system, by computing relative poses from matches in its color camera.

[1]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[2]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[4]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[5]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[8]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[10]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[11]  Trevor Darrell,et al.  Avoiding the "streetlight effect": tracking by exploring likelihood modes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[13]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[14]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[15]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[16]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Reinhard Koch,et al.  Single View Motion Tracking by Depth and Silhouette Information , 2007, SCIA.

[18]  Tamim Asfour,et al.  Robust real-time stereo-based markerless human motion capture , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[19]  Behzad Dariush,et al.  Controlled human pose estimation from depth image streams , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[21]  Craig Gotsman,et al.  Articulated Object Reconstruction and Markerless Motion Capture from Depth Video , 2008, Comput. Graph. Forum.

[22]  Stefano Soatto,et al.  Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images , 2008, ECCV.

[23]  Hans-Peter Seidel,et al.  Markerless motion capture of man-machine interaction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Björn Stenger,et al.  A Single Camera Motion Capture System for Human-Computer Interaction , 2008, IEICE Trans. Inf. Syst..

[25]  Hans-Peter Seidel,et al.  Stabilizing motion tracking using retrieved motion priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[27]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[28]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Amit Bleiweiss,et al.  Markerless motion capture using a single depth sensor , 2009, SIGGRAPH ASIA '09.

[30]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Reinhard Koch,et al.  Time-of-Flight Sensors in Computer Graphics , 2009, Eurographics.

[32]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[33]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[35]  Kenny Erleben,et al.  GPU Accelerated Likelihoods for Stereo-Based Articulated Tracking , 2010, ECCV Workshops.

[36]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[37]  Nassir Navab,et al.  Manifold Learning for ToF-based Human Body Tracking and Activity Recognition , 2010, BMVC.

[38]  Reinhard Koch,et al.  Time-of-Flight sensor calibration for accurate range sensing , 2010, Comput. Vis. Image Underst..

[39]  Vincent Lepetit,et al.  From Canonical Poses to 3D Motion Capture Using a Single Camera , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Raquel Urtasun,et al.  Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[43]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[44]  Jinxiang Chai,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[45]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[46]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[47]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[48]  Tomás Pajdla,et al.  Multi-view reconstruction preserving weakly-supported surfaces , 2011, CVPR 2011.

[49]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[50]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[51]  Adolfo López,et al.  Real-time upper body tracking with online initialization using a range sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[52]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[53]  Nassir Navab,et al.  Estimating human 3D pose from Time-of-Flight images based on geodesic distances and optical flow , 2011, Face and Gesture 2011.

[54]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[55]  Kourosh Khoshelham,et al.  Accuracy analysis of kinect depth data , 2012 .

[56]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[57]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..