Structure discovery in multi-modal data: A region-based approach

The ability of a perception system to discern what is important in a scene and what is not is an invaluable asset, with multiple applications in object recognition, people detection and SLAM, among others. In this paper, we aim to analyze all sensory data available to separate a scene into a few physically meaningful parts, which we term structure, while discarding background clutter. In particular, we consider the combination of image and range data, and base our decision in both appearance and 3D shape. Our main contribution is the development of a framework to perform scene segmentation that preserves physical objects using multi-modal data. We combine image and range data using a novel mid-level fusion technique based on the concept of regions that avoids any pixel-level correspondences between data sources. We associate groups of pixels with 3D points into multi-modal regions that we term regionlets, and measure the structure-ness of each regionlet using simple, bottom-up cues from image and range features. We show that the highest-ranked regionlets correspond to the most prominent objects in the scene. We verify the validity of our approach on 105 scenes of household environments.

[1]  Martial Hebert,et al.  Natural terrain classification using three‐dimensional ladar data for ground robot mobility , 2006, J. Field Robotics.

[2]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3]  Ronen Basri,et al.  Segmentation and boundary detection using multiscale intensity measurements , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Pittsburgh,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011 .

[5]  Friedhelm Meyer auf der Heide,et al.  The randomized z-buffer algorithm: interactive rendering of highly complex scenes , 2001, SIGGRAPH.

[6]  Brian A. Baertlein,et al.  Feature-Level and Decision-Level Fusion of Noncoincidently Sampled Sensors for Land Mine Detection , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[9]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[10]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  H. Nguyen El-E: An Assistive Robot that Fetches Objects from Flat Surfaces , 2008 .

[13]  Donald D. Hoffman,et al.  Parts of recognition , 1984, Cognition.

[14]  Paul Newman,et al.  Online generation of scene descriptions in urban environments , 2008, Robotics Auton. Syst..

[15]  James W. Davis,et al.  Feature-level Fusion for Object Segmentation using Mutual Information , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Ayellet Tal,et al.  Hierarchical mesh decomposition using fuzzy clustering and cuts , 2003, ACM Trans. Graph..

[18]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Cedric Nishan Canagarajah,et al.  Pixel- and region-based image fusion with complex wavelets , 2007, Inf. Fusion.

[20]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  F. Huang,et al.  Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .

[22]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  Jake K. Aggarwal,et al.  The Integration of Image Segmentation Maps using Region and Edge Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[25]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[26]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[27]  Burcu Akinci,et al.  A Comparative Analysis of Depth-Discontinuity and Mixed-Pixel Detection Algorithms , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[28]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[29]  Andrew Y. Ng,et al.  Integrating Visual and Range Data for Robotic Object Detection , 2008, ECCV 2008.

[30]  Geoffrey A. Hollinger,et al.  HERB: a home exploring robotic butler , 2010, Auton. Robots.