Performance Assessment and Calibration of the Kinect 2.0 Time-of-Flight Range Camera for Use in Motion Capture Applications

SUMMARY Robust, three-dimensional (3D) geometric information is a powerful analytical tool, and it is of interest to determine the size, shape, and geome tric properties of objects in the real world when performing pre-mission surveys, deformation analyses, or motion capture analyses. Traditional photogrammetric reconstruction techniques require multiple sensors and retroreflective markers or targets to be placed on objec ts of interest during data acquisition. The Microsoft Kinect 2.0 sensor provides an onboard tim e-of-flight (ToF) ranging sensor based on the Canesta technology. Given the price for the Kin ect 2.0 is $200 USD, it shows potential to become a cost-effective, single-sensor solution for capturing full 3D geometric information in place of costly, multi-sensor techniques requiring invasive or otherwise difficult to place markers. This study examines the performance characteristics and calibration of the Kinect 2.0 sensor in order to determine the feasibility of its use in 3D imaging applications; particularly that of human motion capture. The Kinect 2.0 sensor was tested under controlled c onditions in order to determine the warmup time, distance measurement precision, target ref lectivity dependencies, residual systematic errors, and the quality of human body reconstructio n when compared to a device of known quality. The sensor in question proved promising, s howing similar precision to other ToF imaging systems at a mere fraction of the price. Ov er the course of this testing, it was found that negligible warm-up time is required before the geometric measurement performance stabilizes. Furthermore, a distance measurement pre cision of approximately 1.5mm is achievable when imaging highly reflective, diffuse target surfaces. Beyond the performance characteristics of the sensor itself, a self-calibr ation of the sensor for un-modelled lens distortions improved image measurement residuals by an average of 88%, and likewise improved the range measurement precision by 81%. Despite these results, factors beyond the user’s co ntrol such as scene-dependent distortions, and inhomogeneity in depth accuracy across the image plane limit the potential performance of the sensor. Thus, the following “best practice” guidelines were put forth: 1) Only the inner 300x300 pixels about the centre of the sensor shoul d be used, due to loss in signal strength near the periphery of the image; 2) ensure that the object of interest is within the foreground of the scene, ideally at a range approximately 1-2. 5m away from the sensor; and 3) highlyreflective, diffuse objects should be preferred to darker or shiny objects in the captured scene.