Change Their Perception: RGB-D for 3-D Modeling and Recognition

RGB-D cameras, such as Microsoft Kinect, are active sensors that provide high-resolution dense color and depth information at real-time frame rates. The wide availability of affordable RGB-D cameras is causing a revolution in perception and changing the landscape of robotics and related fields. RGB-D perception has been the focus of a great deal of attention and many research efforts by various fields in the last three years. In this article, we summarize and discuss our ongoing research on the promising uses of RGB-D in three-dimensional (3-D) mapping and 3-D recognition. Combining the strengths of optical cameras and laser rangefinders, the joint use of color and depth in RGB-D sensing makes visual perception more robust and efficient, leading to practical systems that build detailed 3-D models of large indoor spaces, as well as systems that reliably recognize everyday objects in complex scenes. RGB-D perception is yet a burgeoning technology: a rapidly growing number of research projects are being conducted on or using RGB-D perception while RGB-D hardware quickly improves. We believe that RGB-D perception will be on the center stage of perception and, by making robots see much better than before, will enable a variety of perception-based research and applications.

[1]  Dieter Fox,et al.  Manipulator and object tracking for in-hand 3D object modeling , 2011, Int. J. Robotics Res..

[2]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[3]  Zoltan-Csaba Marton,et al.  Tutorial: Point Cloud Library: Three-Dimensional Object Recognition and 6 DOF Pose Estimation , 2012, IEEE Robotics & Automation Magazine.

[4]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[5]  W. Burgard,et al.  Real-time 3 D visual SLAM with a hand-held RGB-D camera , 2011 .

[6]  Dieter Fox,et al.  RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments , 2012, Int. J. Robotics Res..

[7]  Dieter Fox,et al.  RGB-D object discovery via multi-scene analysis , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Andrea Fossati,et al.  Consumer Depth Cameras for Computer Vision , 2013, Advances in Computer Vision and Pattern Recognition.

[9]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Kurt Konolige,et al.  Sparse Sparse Bundle Adjustment , 2010, BMVC.

[11]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[12]  日向 俊二 Kinect for Windowsアプリを作ろう , 2012 .

[13]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[15]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[16]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[17]  Dieter Fox,et al.  Detection-based object labeling in 3D scenes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18]  James Fogarty,et al.  Examining interaction with general-purpose object recognition in LEGO OASIS , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[19]  Dieter Fox,et al.  Interactive 3D modeling of indoor environments with a consumer depth camera , 2011, UbiComp '11.

[20]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[21]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22]  Dieter Fox,et al.  A Scalable Tree-Based Approach for Joint Object and Pose Recognition , 2011, AAAI.

[23]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.