A category-level 3-D object dataset: Putting the Kinect to work

Recent proliferation of a cheap but quality depth sensor, the Microsoft Kinect, has brought the need for a challenging category-level 3D object detection dataset to the fore. We review current 3D datasets and find them lacking in variation of scenes, categories, instances, and viewpoints. Here we present our dataset of color and depth image pairs, gathered in real domestic and office environments. It currently includes over 50 classes, with more images added continuously by a crowd-sourced collection effort. We establish baseline performance in a PASCAL VOC-style detection task, and suggest two ways that inferred world size of the object may be used to improve detection. The dataset and annotations can be downloaded at http://www.kinectdata.com.

[1]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[2]  Eric L. W. Grimson,et al.  From Images to Surfaces: A Computational Study of the Human Early Visual System , 1981 .

[3]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jitendra Malik,et al.  Recognizing Objects in Range Data Using Regional Point Descriptors , 2004, ECCV.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[7]  Andrew W. Fitzgibbon,et al.  Global stereo reconstruction under second order smoothness priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Andrew Y. Ng,et al.  Integrating Visual and Range Data for Robotic Object Detection , 2008, ECCV 2008.

[9]  Dariu Gavrila,et al.  High-Level Fusion of Depth and Intensity for Pedestrian Classification , 2009, DAGM-Symposium.

[10]  Hiroshi Hattori,et al.  Stereo-based Pedestrian Detection using Multiple Patterns , 2009, BMVC.

[11]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  James J. Little,et al.  Multiple Viewpoint Recognition and Localization , 2010, ACCV.

[14]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Bernt Schiele,et al.  Disparity Statistics for Pedestrian Detection: Combining Appearance, Motion and Stereo , 2010, ECCV.

[17]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[19]  Luc Van Gool,et al.  Object Detection and Tracking for Autonomous Navigation in Dynamic Environments , 2010, Int. J. Robotics Res..

[20]  Trevor Darrell,et al.  Size Matters: Metric Visual Search Constraints from Monocular Metadata , 2010, NIPS.

[21]  Bernt Schiele,et al.  Disparity statistics for pedestrian detection: combining appearance, motion and stereo , 2010, ECCV 2010.

[22]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.