HerbDisc: Towards lifelong robotic object discovery

Our long-term goal is to develop a general solution to the lifelong robotic object discovery (LROD) problem: to discover new objects in the environment while the robot operates, for as long as the robot operates. In this paper, we consider the first step towards LROD: we automatically process the raw data stream of an entire workday of a robotic agent to discover objects. Our key contribution to achieve this goal is to incorporate domain knowledge (robotic metadata) in the discovery process, in addition to visual data. We propose a general graph-based formulation for LROD in which generic domain knowledge is encoded as constraints. To make long-term object discovery feasible, we encode into our formulation the natural constraints and non-visual sensory information in service robotics. A key advantage of our generic formulation is that we can add, modify, or remove sources of domain knowledge dynamically, as they become available or as conditions change. In our experiments, we show that by adding domain knowledge we discover 2.7× more objects and decrease processing time 190 times. With our optimized implementation, HerbDisc, we show for the first time a system that processes a video stream of 6 h 20 min of continuous exploration in cluttered human environments (and over half a million images) in 18 min 34 s, to discover 206 new objects with their 3D models.

[1]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[2]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[4]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Nico Blodow,et al.  Towards 3D Point cloud based object maps for household environments , 2008, Robotics Auton. Syst..

[6]  Paul M. Fitzpatrick,et al.  First contact: an active vision approach to segmentation , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[7]  Gert Kootstra,et al.  Fast and bottom-up object detection, segmentation, and evaluation using Gestalt principles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[8]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[11]  Wolfram Burgard,et al.  Unsupervised learning of 3D object models from partial views , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12]  Dimitris N. Metaxas,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Giri Narasimhan,et al.  Clustering Genes Using Heterogeneous Data Sources , 2010, Int. J. Knowl. Discov. Bioinform..

[14]  Jacob V. Bouvrie Multi-Source Contingency Clustering , 2004 .

[15]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Takeo Kanade,et al.  Discovering object instances from scenes of Daily Living , 2011, 2011 International Conference on Computer Vision.

[18]  Siddhartha S. Srinivasa,et al.  Exploiting domain knowledge for Object Discovery , 2013, 2013 IEEE International Conference on Robotics and Automation.

[19]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[21]  Martial Hebert,et al.  Natural terrain classification using three‐dimensional ladar data for ground robot mobility , 2006, J. Field Robotics.

[22]  Siddhartha S. Srinivasa,et al.  Lifelong robotic object perception , 2012 .

[23]  Lawrence O. Hall,et al.  A Cluster Ensemble Framework for Large Data sets , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[24]  P. Caws,et al.  The structure of discovery. , 1969, Science.

[25]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[26]  Geoffrey A. Hollinger,et al.  HERB: a home exploring robotic butler , 2010, Auton. Robots.

[27]  Nico Blodow,et al.  General 3D modelling of novel objects from a single view , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[29]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[30]  Siddhartha S. Srinivasa,et al.  Structure discovery in multi-modal data: A region-based approach , 2011, 2011 IEEE International Conference on Robotics and Automation.

[31]  Takeo Kanade,et al.  Connecting Missing Links: Object Discovery from Sparse Observations Using 5 Million Product Images , 2012, ECCV.

[32]  Yiannis Aloimonos,et al.  Visual Segmentation of Simple Objects for Robots , 2011, Robotics: Science and Systems.

[33]  Joachim Hertzberg,et al.  The Efficient Extension of Globally Consistent Scan Matching to 6 DoF , 2008 .

[34]  M. Vincze,et al.  BLORT-The Blocks World Robotic Vision Toolbox , 2010 .

[35]  Dieter Fox,et al.  Toward object discovery and modeling via 3-D scene comparison , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[37]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.