Generating object hypotheses in natural scenes through human-robot interaction

We propose a method for interactive modeling of objects and object relations based on real-time segmentation of video sequences. In interaction with a human, the robot can perform multi-object segmentation through principled modeling of physical constraints. The key contribution is an efficient multi-labeling framework, that allows object modeling and disambiguation in natural scenes. Object modeling and labeling is done in a real-time segmentation system, to which hypotheses and constraints denoting relations between objects can be added incrementally. Through instructions such as key presses or spoken words, a scene can be segmented in regions corresponding to multiple physical objects. The approach solves some of the difficult problems related to disambiguation of objects merged due to their direct physical contact. Results show that even a limited set of simple interactions with a human operator can substantially improve segmentation results.

[1]  Joachim M. Buhmann,et al.  Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Michal Irani,et al.  What Is a Good Image Segment? A Unified Approach to Segment Extraction , 2008, ECCV.

[3]  Gert Kootstra,et al.  Using Symmetry to Select Fixation Points for Segmentation , 2010, 2010 20th International Conference on Pattern Recognition.

[4]  Loong Fah Cheong,et al.  Active segmentation with fixation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Yong Jae Lee,et al.  Collect-cut: Segmentation with top-down cues discovered in multi-object images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Wolfram Burgard,et al.  Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[7]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jian Sun,et al.  Lazy snapping , 2004, SIGGRAPH 2004.

[10]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[11]  Danica Kragic,et al.  Enhanced visual scene understanding through human-robot dialog , 2010, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[13]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[16]  A. Leonardis,et al.  A basic cognitive system for interactive continuous learning of visual concepts , 2010 .

[17]  Henrik I. Christensen,et al.  Clarification dialogues in human-augmented mapping , 2006, HRI '06.

[18]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[19]  Danica Kragic,et al.  Active 3D Segmentation through Fixation of Previously Unseen Objects , 2010, BMVC.

[20]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Donald Geman,et al.  Boundary Detection by Constrained Optimization , 1990, IEEE Trans. Pattern Anal. Mach. Intell..