Context-aware 3D object anchoring for mobile robots

Abstract A world model representing the elements in a robot’s environment needs to maintain a correspondence between the objects being observed and their internal representations, which is known as the anchoring problem. Anchoring is a key aspect for an intelligent robot operation, since it enables high-level functions such as task planning and execution. This work presents an anchoring system that continually integrates new observations from a 3D object recognition algorithm into a probabilistic world model. Our system takes advantage of the contextual relations inherent to human-made spaces in order to improve the classification results of the baseline object recognition system. To achieve that, the system builds a graph-based world model containing the objects in the scene (both in the current and previously perceived observations), which is exploited by a Probabilistic Graphical Model (PGM) in order to leverage contextual information during recognition. The world model also enables the system to exploit information about objects beyond the current field of view of the robot sensors. Most importantly, this is done in an online fashion, overcoming both the disadvantages of single-shot recognition systems (e.g., limited sensor aperture) and offline recognition systems that require prior registration of all frames of a scene (e.g., dynamic scenes, unsuitability for plan-based robot control). We also propose a novel way to include the outcome of local object recognition methods in the PGM, which results in a decrease in the usually high model learning complexity and an increase in the system performance. The system performance has been assessed with a dataset collected by a mobile robot from restaurant-like settings, obtaining positive results for both its data association and object recognition capabilities. The system has been successfully used in the RACE robotic architecture.

[1]  Daniel Huber,et al.  Using Context to Create Semantic 3D Models of Indoor Environments , 2010, BMVC.

[2]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[3]  Antonios Gasteratos,et al.  Semantic mapping for mobile robotics tasks: A survey , 2015, Robotics Auton. Syst..

[4]  Cipriano Galindo,et al.  UPGMpp: a Software Library for Contextual Object Recognition , 2015 .

[5]  Armando J. Pinho,et al.  An Ontology-based Multi-level Robot Architecture for Learning from Experiences , 2013, AAAI Spring Symposium: Designing Intelligent Robots.

[6]  Erwin Prassler,et al.  A scene graph based shared 3D world model for robotic applications , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  John Folkesson,et al.  Combining Top-down Spatial Reasoning and Bottom-up Object Class Recognition for Scene Understanding , 2014, IROS 2014.

[8]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Jörg Stückler,et al.  Semantic mapping using object-class segmentation of RGB-D images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Alberto Elfes,et al.  Sonar-based real-world mapping and navigation , 1987, IEEE J. Robotics Autom..

[11]  Eric Brachmann,et al.  PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jos Elfring,et al.  Semantic world modeling using probabilistic multiple hypothesis anchoring , 2013, Robotics Auton. Syst..

[13]  Sebastian Thrun,et al.  Learning Occupancy Grid Maps with Forward Sensor Models , 2003, Auton. Robots.

[14]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[15]  Sebastian Thrun,et al.  Learning Metric-Topological Maps for Indoor Mobile Robot Navigation , 1998, Artif. Intell..

[16]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[17]  Nathan Silberman,et al.  Indoor scene segmentation using a structured light sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[18]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[19]  Jörg Stückler,et al.  Dense real-time mapping of object-class semantics from RGB-D video , 2013, Journal of Real-Time Image Processing.

[20]  Nico Blodow Managing Belief States for Service Robots , 2014 .

[21]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, ICCV Workshops.

[22]  Markus Vincze,et al.  A multi-modal RGB-D object recognizer , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[23]  Sven Behnke,et al.  Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  John Folkesson,et al.  A Comparison of Qualitative and Metric Spatial Relation Models for Scene Understanding , 2015, AAAI.

[25]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[26]  Mohammed Bennamoun,et al.  Geometry Driven Semantic Labeling of Indoor Scenes , 2014, ECCV.

[27]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Javier González,et al.  Subjective local maps for hybrid metric-topological SLAM , 2009, Robotics Auton. Syst..

[29]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, CVPR.

[30]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Emanuele Menegatti,et al.  Bayesian inference in the space of topological maps , 2006, IEEE Transactions on Robotics.

[32]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[34]  Joachim Hertzberg,et al.  Model-based furniture recognition for building semantic object maps , 2017, Artif. Intell..

[35]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37]  Alessandro Saffiotti,et al.  An introduction to the anchoring problem , 2003, Robotics Auton. Syst..

[38]  Daniele Nardi,et al.  Living with robots: Interactive environmental knowledge acquisition , 2016, Robotics Auton. Syst..

[39]  Thorsten Joachims,et al.  Contextually guided semantic labeling and search for three-dimensional point clouds , 2013, Int. J. Robotics Res..

[40]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Nico Blodow,et al.  Perception and probabilistic anchoring for dynamic world state logging , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[42]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[43]  Matthias Fichtner,et al.  Anchoring Symbols to Percepts in the Fluent Calculus , 2011, KI - Künstliche Intelligenz.

[44]  Chong-Wah Ngo,et al.  Semantic context modeling with maximal margin Conditional Random Fields for automatic image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Matei T. Ciocarlie,et al.  Towards Reliable Grasping and Manipulation in Household Environments , 2010, ISER.

[46]  Jörg Stückler,et al.  Multi-view deep learning for consistent semantic mapping with RGB-D cameras , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[48]  José-Raúl Ruiz-Sarmiento,et al.  Exploiting semantic knowledge for robot object recognition , 2015, Knowl. Based Syst..

[49]  José-Raúl Ruiz-Sarmiento,et al.  A survey on learning approaches for Undirected Graphical Models. Application to scene object recognition , 2017, Int. J. Approx. Reason..

[50]  José-Raúl Ruiz-Sarmiento,et al.  Robot@Home, a robotic dataset for semantic mapping of home environments , 2017, Int. J. Robotics Res..

[51]  José-Raúl Ruiz-Sarmiento,et al.  Building Multiversal Semantic Maps for Mobile Robot Operation , 2017, Knowl. Based Syst..

[52]  Benjamin J. Tamber-Rosenau,et al.  Avoiding non-independence in fMRI data analysis: Leave one subject out , 2010, NeuroImage.

[53]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[54]  John Folkesson,et al.  Relational Approaches for Joint Object Classification and Scene Similarity Measurement in Indoor Environments , 2014, AAAI Spring Symposia.

[55]  Benjamin Kuipers,et al.  Towards a general theory of topological maps , 2004, Artif. Intell..

[56]  Eric Brachmann,et al.  Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Henk A. P. Blom,et al.  Interacting multiple model joint probabilistic data association avoiding track coalescence , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[58]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[59]  Thorsten Joachims,et al.  Semantic Labeling of 3D Point Clouds for Indoor Scenes , 2011, NIPS.

[60]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[62]  Nico Blodow,et al.  RoboSherlock: Unstructured information processing for robot perception , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[63]  Cipriano Galindo,et al.  Mobile Robot Object Recognition through the Synergy of Probabilistic Graphical Models and Semantic Knowledge , 2014 .

[64]  Bastian Leibe,et al.  Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[65]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[66]  Gi Hyun Lim,et al.  A perceptual memory system for grounding semantic representations in intelligent service robots , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Sven Behnke,et al.  RGB-D object detection and semantic segmentation for autonomous manipulation in clutter , 2018, Int. J. Robotics Res..

[69]  Donald Reid An algorithm for tracking multiple targets , 1978 .

[70]  Gi Hyun Lim,et al.  Interactive Open-Ended Learning for 3D Object Recognition: An Approach and Experiments , 2015, J. Intell. Robotic Syst..

[71]  Ali Shahrokni,et al.  Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Jianwei Zhang,et al.  The RACE Project , 2014, KI - Künstliche Intelligenz.

[73]  Michael Firman,et al.  RGBD Datasets: Past, Present and Future , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[74]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).