Exploiting and modeling local 3D structure for predicting object locations

In this paper, we argue that there is a strong correlation between local 3D structure and object placement in everyday scenes. We call this the 3D context of the object. In previous work, this is typically hand-coded and limited to flat horizontal surfaces. In contrast, we propose to use a more general model for 3D context and learn the relationship between 3D context and different object classes. This way, we can capture more complex 3D contexts without implementing specialized routines. We present extensive experiments with both qualitative and quantitative evaluations of our method for different object classes. We show that our method can be used in conjunction with an object detection algorithm to reduce the rate of false positives. Our results support that the 3D structure surrounding objects in everyday scenes is a strong indicator of their placement and that it can give significant improvements in the performance of, for example, an object detection system. For evaluation, we have collected a large dataset of Microsoft Kinect frames from five different locations, which we also make publicly available.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  C. V. Jawahar,et al.  Generalized RBF feature maps for Efficient Detection , 2010, BMVC.

[3]  Dejan Pangercic,et al.  Combining perception and knowledge processing for everyday manipulation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Patric Jensfelt,et al.  Topological spatial relations for active visual search , 2012, Robotics Auton. Syst..

[5]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Georg Heigold,et al.  Object classification by fusing SVMs and Gaussian mixtures , 2010, Pattern Recognit..

[7]  Simone Frintrop,et al.  Attentional Landmarks and Active Gaze Control for Visual SLAM , 2008, IEEE Transactions on Robotics.

[8]  Antonio Torralba,et al.  Context models and out-of-context objects , 2012, Pattern Recognit. Lett..

[9]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[10]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[11]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[12]  T. Duckett,et al.  VOCUS : A Visual Attention System for Object Detection and Goal-directed Search , 2010 .

[13]  Patric Jensfelt,et al.  Mechanical support as a spatial abstraction for mobile robots , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[16]  Patric Jensfelt,et al.  Plan-based Object Search and Exploration using Semantic Spatial Knowledge in the Real World , 2011, ECMR.

[17]  John Folkesson,et al.  Search in the real world: Active visual object search based on spatial relations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Zoltan-Csaba Marton,et al.  Probabilistic categorization of kitchen objects in table settings with a composite sensor , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Joachim Hertzberg,et al.  Saliency-based object recognition in 3D data , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[21]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[22]  Jan-Olof Eklundh,et al.  Vision in the real world: Finding, attending and recognizing objects , 2006, Int. J. Imaging Syst. Technol..

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .