Lifelong robotic object perception

In this thesis, we study the topic of Lifelong Robotic Object Perception. We propose, as a long-term goal, a framework to recognize known objects and to discover unknown objects in the environment as the robot operates, for as long as the robot operates. We build the foundations for Lifelong Robotic Object Perception by focusing our study on the two critical components of this framework: 1) how to recognize and register known objects for robotic manipulation, and 2) how to automatically discover novel objects in the environment so that we can recognize them in the future. Our work on Object Recognition and Pose Estimation addresses two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We present MOPED, a framework for Multiple Object Pose Estimation and Detection that integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We extend MOPED to leverage RGBD images using an adaptive image-depth fusion model based on maximum likelihood estimates. We incorporate this model to each stage of MOPED to achieve object recognition robust to imperfect depth data. In Robotic Object Discovery, we address the challenges of scalability and robustness for long-term operation. As a first step towards Lifelong Robotic Object Perception, we aim to automatically process the raw video stream of an entire workday of a robotic agent to discover novel objects. The key to achieve this goal is to incorporate non-visual information—robotic metadata—in the discovery process. We encode the natural constraints and non-visual sensory information in service robotics to make long-term object discovery feasible. We introduce an optimized implementation, HerbDisc, that processes a video stream of 6 h 20 min of challenging human environments in under 19 min and discovers 206 novel objects. We tailor our solutions to the sensing capabilities and requirements in service robotics, with the goal of enabling our service robot, HERB, to operate autonomously in human environments.

[1]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[2]  Ross A. Knepper,et al.  Herb 2.0: Lessons Learned From Developing a Mobile Manipulator for the Home , 2012, Proceedings of the IEEE.

[3]  Yiannis Aloimonos,et al.  Visual Segmentation of Simple Objects for Robots , 2011, Robotics: Science and Systems.

[4]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[7]  Jitendra Malik,et al.  Using contours to detect and localize junctions in natural images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[9]  Mario Fritz,et al.  Improving the Kinect by Cross-Modal Stereo , 2011, BMVC.

[10]  Yiannis Aloimonos,et al.  Segmenting “simple” objects using RGB-D , 2012, 2012 IEEE International Conference on Robotics and Automation.

[11]  Martial Hebert,et al.  Natural terrain classification using three‐dimensional ladar data for ground robot mobility , 2006, J. Field Robotics.

[12]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[13]  Friedhelm Meyer auf der Heide,et al.  The randomized z-buffer algorithm: interactive rendering of highly complex scenes , 2001, SIGGRAPH.

[14]  Andrea Salgian,et al.  Appearance-based object recognition using multiple views , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Vincent Lepetit,et al.  Dominant orientation templates for real-time detection of texture-less objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Lawrence O. Hall,et al.  A Cluster Ensemble Framework for Large Data sets , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[17]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Nico Blodow,et al.  Close-range scene segmentation and reconstruction of 3D point cloud maps for mobile manipulation in domestic environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Thorsten Thormählen,et al.  Keyframe Selection for Camera Motion and Structure Estimation from Multiple Views , 2004, ECCV.

[22]  Zhengyou Zhang,et al.  Parameter estimation techniques: a tutorial with application to conic fitting , 1997, Image Vis. Comput..

[23]  Alvaro Collet,et al.  Making specific features less discriminative to improve point-based 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Dieter Fox,et al.  A Scalable Tree-Based Approach for Joint Object and Pose Recognition , 2011, AAAI.

[25]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[26]  Chitra Dorai,et al.  3D object recognition: Representation and matching , 2000, Stat. Comput..

[27]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[29]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[30]  Richard Szeliski,et al.  Recovering 3D shape and motion from image streams using nonlinear least squares , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[32]  Siddhartha S. Srinivasa,et al.  Efficient multi-view object recognition and full pose estimation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[34]  Michal Havlena,et al.  Efficient Structure from Motion by Graph Optimization , 2010, ECCV.

[35]  Dieter Fox,et al.  Toward object discovery and modeling via 3-D scene comparison , 2011, 2011 IEEE International Conference on Robotics and Automation.

[36]  Siddhartha S. Srinivasa,et al.  Object recognition and full pose registration from a single image for robotic manipulation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[37]  Nico Blodow,et al.  Fast Point Feature Histograms (FPFH) for 3D registration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[38]  Ronen Basri,et al.  Segmentation and boundary detection using multiscale intensity measurements , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[39]  Gert Kootstra,et al.  Fast and bottom-up object detection, segmentation, and evaluation using Gestalt principles , 2011, 2011 IEEE International Conference on Robotics and Automation.

[40]  Peter M. Kogge,et al.  Cache implications of aggressively pipelined high performance microprocessors , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[41]  Christian Perwass,et al.  Increasing pose estimation performance using multi-cue integration , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[42]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[43]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Mubarak Shah,et al.  Multi-sensor fusion: a perspective , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[45]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Siddhartha S. Srinivasa,et al.  MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[47]  Wolfram Burgard,et al.  Unsupervised learning of 3D object models from partial views , 2009, 2009 IEEE International Conference on Robotics and Automation.

[48]  Nico Blodow,et al.  Towards 3D Point cloud based object maps for household environments , 2008, Robotics Auton. Syst..

[49]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from a Single Image , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[50]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[51]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[52]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[53]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[54]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[55]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[56]  Geoffrey A. Hollinger,et al.  HERB: a home exploring robotic butler , 2010, Auton. Robots.

[57]  Gary R. Bradski,et al.  Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[58]  Paul Newman,et al.  Online generation of scene descriptions in urban environments , 2008, Robotics Auton. Syst..

[59]  Nico Blodow,et al.  General 3D modelling of novel objects from a single view , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Björn Johansson,et al.  Comparison of local image descriptors for full 6 degree-of-freedom pose estimation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[62]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Wei Tang,et al.  Clustering with Multiple Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[64]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[65]  Takeo Kanade,et al.  Discovering object instances from scenes of Daily Living , 2011, 2011 International Conference on Computer Vision.

[66]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[67]  David G. Lowe,et al.  Using stereo for object recognition , 2010, 2010 IEEE International Conference on Robotics and Automation.

[68]  Francesco Zanichelli,et al.  The long and winding road to high-performance image processing with MMX/SSE , 2000, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception.

[69]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[70]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[71]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[72]  Luc Van Gool,et al.  Fast scale invariant feature detection and matching on programmable graphics hardware , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[73]  Zhuowen Tu,et al.  Image Segmentation by Data-Driven Markov Chain Monte Carlo , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  Jacob V. Bouvrie Multi-Source Contingency Clustering , 2004 .

[75]  Kurt Konolige,et al.  Projected texture stereo , 2010, 2010 IEEE International Conference on Robotics and Automation.

[76]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[77]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[78]  Siddhartha S. Srinivasa,et al.  Structure discovery in multi-modal data: A region-based approach , 2011, 2011 IEEE International Conference on Robotics and Automation.

[79]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[80]  Shimon Ullman,et al.  Combining Top-Down and Bottom-Up Segmentation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[81]  Jake K. Aggarwal,et al.  The Integration of Image Segmentation Maps using Region and Edge Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Dieter Fox,et al.  Sparse distance learning for object recognition combining RGB and depth information , 2011, 2011 IEEE International Conference on Robotics and Automation.

[83]  Wen-Yan Chang,et al.  On pose recovery for generalized visual sensors , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[85]  Long Quan,et al.  Resampling Structure from Motion , 2010, ECCV.

[86]  David S. Johnson,et al.  Fast Algorithms for Bin Packing , 1974, J. Comput. Syst. Sci..

[87]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[88]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Joachim Hertzberg,et al.  The Efficient Extension of Globally Consistent Scan Matching to 6 DoF , 2008 .

[90]  Remco C. Veltkamp,et al.  A survey of content based 3D shape retrieval methods , 2004, Proceedings Shape Modeling Applications, 2004..

[91]  M. Vincze,et al.  BLORT-The Blocks World Robotic Vision Toolbox , 2010 .

[92]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  Takeo Kanade,et al.  Connecting Missing Links: Object Discovery from Sparse Observations Using 5 Million Product Images , 2012, ECCV.

[94]  Burcu Akinci,et al.  A Comparative Analysis of Depth-Discontinuity and Mixed-Pixel Detection Algorithms , 2007, Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007).

[95]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[96]  Ranga Vemuri,et al.  Hardware-software partitioning and pipelined scheduling of transformative applications , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[97]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[98]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[99]  Axel Pinz,et al.  Robust Pose Estimation from a Planar Target , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[100]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.

[101]  Danica Kragic,et al.  Active 3D scene segmentation and detection of unknown objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[102]  Alberto Del Bimbo,et al.  Content-based retrieval of 3D models , 2006, TOMCCAP.

[103]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.

[104]  Dimitris N. Metaxas,et al.  D - Clutter: Building object model library from unsupervised segmentation of cluttered scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[106]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Hirokazu Kato,et al.  Marker tracking and HMD calibration for a video-based augmented reality conferencing system , 1999, Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR'99).

[108]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[109]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[110]  Ayellet Tal,et al.  Hierarchical mesh decomposition using fuzzy clustering and cuts , 2003, ACM Trans. Graph..

[111]  F. Huang,et al.  Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .

[112]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[113]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[114]  David Nistér,et al.  Frame Decimation for Structure and Motion , 2000, SMILE.

[115]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..