RoboSherlock: Cognition-enabled Robot Perception for Everyday Manipulation Tasks

A pressing question when designing intelligent autonomous systems is how to integrate the various subsystems concerned with complementary tasks. More specifically, robotic vision must provide task-relevant information about the environment and the objects in it to various planning related modules. In most implementations of the traditional Perception-Cognition-Action paradigm these tasks are treated as quasi-independent modules that function as black boxes for each other. It is our view that perception can benefit tremendously from a tight collaboration with cognition. We present RoboSherlock, a knowledge-enabled cognitive perception systems for mobile robots performing human-scale everyday manipulation tasks. In RoboSherlock, perception and interpretation of realistic scenes is formulated as an unstructured information management(UIM) problem. The application of the UIM principle supports the implementation of perception systems that can answer task-relevant queries about objects in a scene, boost object recognition performance by combining the strengths of multiple perception algorithms, support knowledge-enabled reasoning about objects and enable automatic and knowledge-driven generation of processing pipelines. We demonstrate the potential of the proposed framework through feasibility studies of systems for real-world scene perception that have been built on top of the framework.

[1]  Michael Beetz,et al.  PR2 looking at things — Ensemble learning for unstructured information processing with Markov logic networks , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Federico Tombari,et al.  Object Recognition in 3D Scenes with Occlusions and Clutter by Hough Voting , 2010, 2010 Fourth Pacific-Rim Symposium on Image and Video Technology.

[3]  Bernd Neumann,et al.  Learning a knowledge base of ontological concepts for high-level scene interpretation , 2007, ICMLA 2007.

[4]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[5]  Masayuki Inaba,et al.  Multi-cue 3D object recognition in knowledge-based vision-guided humanoid robot system , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Dejan Pangercic,et al.  Semantic Object Maps for robotic housework - representation, acquisition and use , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Nicholas Roy,et al.  Recognition and Pose Estimation of Rigid Transparent Objects with a Kinect Sensor , 2013 .

[8]  Céline Hudelot Towards a Cognitive Vision Platform for Semantic Image Interpretation; Application to the Recognition of Biological Organisms , 2005 .

[9]  Richard Dearden,et al.  Planning to see: A hierarchical approach to planning visual actions on a robot using POMDPs , 2010, Artif. Intell..

[10]  Pittsburgh,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011 .

[11]  Pierre Lison,et al.  Self-Understanding and Self-Extension: A Systems and Representational Approach , 2010, IEEE Transactions on Autonomous Mental Development.

[12]  Óscar Martínez Mozos,et al.  Furniture Models Learned from the WWW , 2011, IEEE Robotics & Automation Magazine.

[13]  Zoltan-Csaba Marton,et al.  Tutorial: Point Cloud Library: Three-Dimensional Object Recognition and 6 DOF Pose Estimation , 2012, IEEE Robotics & Automation Magazine.

[14]  Moritz Tenorth,et al.  KnowRob: A knowledge processing infrastructure for cognition-enabled robots , 2013, Int. J. Robotics Res..

[15]  Gary R. Bradski,et al.  REIN - A fast, robust, scalable REcognition INfrastructure , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  Nazli Ikizler-Cinbis,et al.  Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures , 2016, J. Artif. Intell. Res..

[17]  Christopher Town,et al.  Ontological inference for image and video analysis , 2006, Machine Vision and Applications.

[18]  M. Vincze,et al.  BLORT-The Blocks World Robotic Vision Toolbox , 2010 .

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[21]  Dejan Pangercic,et al.  Combining perception and knowledge processing for everyday manipulation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Marc Hanheide,et al.  An integrated system for interactive continuous learning of categorical knowledge , 2016, J. Exp. Theor. Artif. Intell..

[23]  Michael Beetz,et al.  Scaling perception towards autonomous object manipulation — in knowledge lies the power , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Stefan Holzer,et al.  Towards autonomous robotic butlers: Lessons learned with the PR2 , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Matthias Scheutz,et al.  ADE - Steps Towards a Distributed Development and Runtime Environment for Complex Robotic Agent Architectures , 2006 .

[27]  Michael Beetz,et al.  Pervasive 'Calm' Perception for Autonomous Robotic Agents , 2015, AAMAS.

[28]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[30]  Sandro Rama Fiorini,et al.  A review on knowledge-based computer vision , 2010 .

[31]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Barbara Caputo,et al.  Multi-modal Semantic Place Classification , 2010, Int. J. Robotics Res..

[33]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[34]  Michael Beetz,et al.  Amortized Object and Scene Perception for Long-term Robot Manipulation , 2019, ArXiv.

[35]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[36]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[37]  Nico Blodow,et al.  CoP-Man -- Perception for Mobile Pick-and-Place in Human Living Environments , 2009 .

[38]  Jesús Chamorro-Martínez,et al.  Diatom autofocusing in brightfield microscopy: a comparative study , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[39]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Danica Kragic,et al.  SimTrack: A simulation-based framework for scalable real-time object pose detection and tracking , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Michael Beetz,et al.  Towards robots conducting chemical experiments , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Nico Blodow,et al.  RoboSherlock: Unstructured information processing for robot perception , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Lucian Cosmin Goron,et al.  Segmenting Cylindrical and Box-like Objects in Cluttered 3D Scenes , 2012, Robotik 2012.

[45]  Moritz Tenorth,et al.  Knowledge Processing for Autonomous Robots , 2011 .

[46]  Nico Blodow,et al.  Managing Belief States for Service Robots : Dynamic Scene Perception and Spatio-temporal Memory , 2014 .

[47]  Michael Beetz,et al.  RECIPE - A System for Building Extensible, Run-time Configurable, Image Processing Systems , 1998 .

[48]  Michael Beetz,et al.  Variations on a Theme: “It's a Poor Sort of Memory that Only Works Backwards” , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[50]  Zoltan-Csaba Marton,et al.  Storing and retrieving perceptual episodic memories for long-term manipulation tasks , 2017, 2017 18th International Conference on Advanced Robotics (ICAR).

[51]  Michael Beetz,et al.  Perception for Everyday Human Robot Interaction , 2015, KI - Künstliche Intelligenz.

[52]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[53]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.