Improving the Kinect by Cross-Modal Stereo

The introduction of the Microsoft Kinect Sensors has stirred significant interest in the robotics community. While originally developed as a gaming interface, a high quality depth sensor and affordable price have made it a popular choice for robotic perception. Its active sensing strategy is very well suited to produce robust and high-frame rate depth maps for human pose estimation. But the shift to the robotics domain surfaced applications under a wider set of operation condition it wasn’t originally designed for. We see the sensor fail completely on transparent and specular surfaces which are very common to every day household objects. As these items are of great interest in home robotics and assistive technologies, we have investigated methods to reduce and sometimes even eliminate these effects without any modification of the hardware. In particular, we complement the depth estimate within the Kinect by a cross-modal stereo path that we obtain from disparity matching between the included IR and RGB sensor of the Kinect. We investigate how the RGB channels can be combined optimally in order to mimic the image response of the IR sensor by an early fusion scheme of weighted channels as well as a late fusion scheme that computes stereo matches between the different channels independently. We show a strong improvement in the reliability of the depth estimate as well as improved performance on a object segmentation task in a table top scenario.

[1]  Richard Szeliski,et al.  Stereo matching with linear superposition of layers , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[3]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[4]  Ruigang Yang,et al.  Fusion of time-of-flight depth and stereo for high accuracy depth maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Y. Ng,et al.  Integrating Visual and Range Data for Robotic Object Detection , 2008, ECCV 2008.

[6]  Zoltan-Csaba Marton,et al.  Probabilistic categorization of kitchen objects in table settings with a composite sensor , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Young Min Kim,et al.  Multi-view image and ToF sensor fusion for dense 3D reconstruction , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[8]  Trevor Darrell,et al.  An Additive Latent Feature Model for Transparent Object Recognition , 2009, NIPS.

[9]  Charles C. Kemp,et al.  A list of household objects for robotic retrieval prioritized by people with ALS , 2008, 2009 IEEE International Conference on Rehabilitation Robotics.

[10]  Daumé,et al.  Frustratingly Easy Semi-Supervised Domain Adaptation , 2010 .

[11]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[12]  Trevor Darrell,et al.  Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[13]  Kiriakos N. Kutulakos,et al.  Transparent and Specular Object Reconstruction , 2010, Comput. Graph. Forum.

[14]  Michael Beetz,et al.  Transparent object detection and reconstruction on a mobile platform , 2011, 2011 IEEE International Conference on Robotics and Automation.