Saliency-based object discovery on RGB-D data with a late-fusion approach

We present a novel method based on saliency and segmentation to generate generic object candidates from RGB-D data. Our method uses saliency as a cue to roughly estimate the location and extent of the objects present in the scene. Salient regions are used to glue together the segments obtained from over-segmenting the scene by either color or depth segmentation algorithms, or by a combination of both. We suggest a late-fusion approach that first extracts segments from color and depth independently before fusing them to exploit that the data is complementary. Furthermore, we investigate several mechanisms for ranking the object candidates. We evaluate on one publicly available dataset and on one challenging sequence with a high degree of clutter. The results show that we are able to retrieve most objects in real-world indoor scenes and clearly outperform other state-of-the art methods.

[1]  Armin B. Cremers,et al.  Attention-Based Detection of Unknown Objects in a Situated Vision Framework , 2013, KI - Künstliche Intelligenz.

[2]  Ales Ude,et al.  Physical interaction for segmentation of unknown textured and non-textured rigid objects , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Danica Kragic,et al.  Attention-based active 3D point cloud segmentation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yiannis Aloimonos,et al.  Visual Segmentation of Simple Objects for Robots , 2011, Robotics: Science and Systems.

[6]  Markus Vincze,et al.  Segmentation of unknown objects in indoor environments , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Fei-Fei Li,et al.  Object discovery in 3D scenes via shape analysis , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[9]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Siddhartha S. Srinivasa,et al.  Exploiting domain knowledge for Object Discovery , 2013, 2013 IEEE International Conference on Robotics and Automation.

[11]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Simone Frintrop,et al.  A Computational Framework for Attentional 3D Object Detection , 2013, CogSci.

[13]  Markus Vincze,et al.  Attention-driven object detection and segmentation of cluttered table scenes using 2.5D symmetry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[15]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Simone Frintrop,et al.  VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search , 2006, Lecture Notes in Computer Science.

[17]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Matthew B. Blaschko,et al.  Learning a category independent object detection cascade , 2011, 2011 International Conference on Computer Vision.

[20]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[21]  B. Scholl Objects and attention: the state of the art , 2001, Cognition.

[22]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[23]  G. Schneider Two visual systems. , 1969, Science.

[24]  Dieter Fox,et al.  Toward object discovery and modeling via 3-D scene comparison , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  M. Goodale,et al.  Two visual systems re-viewed , 2008, Neuropsychologia.

[26]  Florentin Wörgötter,et al.  Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  J. Andrew Bagnell,et al.  Interactive segmentation, tracking, and kinematic modeling of unknown 3D articulated objects , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28]  Wolfram Burgard,et al.  Unsupervised learning of 3D object models from partial views , 2009, 2009 IEEE International Conference on Robotics and Automation.

[29]  Armin B. Cremers,et al.  A Cognitive Approach for Object Discovery , 2014, 2014 22nd International Conference on Pattern Recognition.

[30]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.