Combining Cognitive Vision, Knowledge-Level Planning with Sensing, and Execution Monitoring for Effective Robot Control

We describe an approach to robot control in real-world environments that integrates a cognitive vision system with a knowledge-level planner and plan execution monitor. Our approach makes use of a formalism called an Object-Action Complex (OAC) to overcome some of the representational differences that arise between the low-level control mechanisms and high-level reasoning components of the system. We are particularly interested in using OACs as a formalism that enables us to induce certain aspects of the representation, suitable for planning, through the robot’s interaction with the world. Although this work is at a preliminary stage, we have implemented our ideas in a framework that supports object discovery, planning with sensing, action execution, and failure recovery, with the long term goal of designing a system that can be transferred to other robot platforms and planners. Introduction and Motivation A robot operating in a real-world domain must typically rely on a range of mechanisms that combine both reactive and planned behaviour, and operate at different levels of representational abstraction. Building a system that can effectively perform these tasks requires overcoming a number of theoretical and practical challenges that arise from integrating such diverse components within a single framework. One of the crucial aspects of the integration task is representation: the requirements of robot controllers differ from those of traditional planning systems, and neither representation is usually sufficient to accommodate the needs of an integrated system. For instance, robot systems often use real-valued representations to model features like 3D spatial coordinates and joint angles, allowing robot behaviours to be specified as continuous transforms of vectors over time (Murray, Li, and Sastry 1994). On the other hand, planning systems tend to use representations based on discrete, symbolic models of objects, properties, and actions, described in languages like STRIPS (Fikes and Nilsson 1971) or PDDL (McDermott 1998). Overcoming these differences is essential for building a system that can act in the real world. In this paper we describe an approach that combines a cognitive vision system with a knowledge-level planner and plan execution monitor, on a robot platform that can manipulate objects in a restricted, but uncertain, environment. Our system uses a multi-level architecture that mixes a low-level robot/vision controller for object manipulation and scene interpretation, with high-level components for reasoning, planning, and action failure recovery. To overcome the modelling differences between the different system components, we use a representational unit called an Object-Action Complex (OAC) (Geib et al. 2006; Krüger et al. 2009), which arises naturally from the robot’s interaction with the world. OACs provide an object/situation-oriented notion of affordance in a universal formalism for describing state change. Although the idea of combining a robot/vision system with an automated planner is not new, the particular components we use each bring their own strengths to this work. For instance, the cognitive vision system (Krüger, Lappe, and Wörgötter 2004; Pugeault 2008) provides a powerful object discovery mechanism that lets us induce certain aspects of the representation, suitable for planning, from the robot’s basic “reflex” actions. The high-level planner, PKS (Petrick and Bacchus 2002; 2004), is effective at constructing plans under conditions of incomplete information, with both ordinary physical actions and sensing actions. Moreover, OACs occur at all levels of the system and, we believe, provide a novel solution to some of the integration problems that arise in our architecture. This paper reports on work currently in progress, centred around OACs and their role in object discovery, planning with sensing, action execution, and failure recovery in uncertain domains. This work also forms part of a larger project investigating perception, action, and cognition, and combines multiple robot platforms with symbolic representations and reasoning mechanisms. We have therefore approached this work with a great deal of generality, in order to facilitate the transfer of our ideas to robot platforms and planners with capabilities other than those we describe here. Hardware Setup and Testing Domain The hardware setup used in this work (see Figure 1) consists of a six-degree-of-freedom industrial robot arm (Stäubli RX60) with a force/torque (FT) sensor (Schunk FTACL 5080) and a two-finger-parallel gripper (Schunk PG 70) attached. The FT sensor is mounted between the robot arm and gripper and is used to detect collisions which might occur due to limited knowledge about the objects in the world. In addition, a calibrated stereo camera system is mounted in a fixed position. The AVT Pike cameras have a resolution Industrial robot Twofinger gripper Foam floor Stereo camera system 6D Force torque sensor Figure 1: Hardware setup. of up to 2048x2048 pixels and can produce high-resolution images for particular regions of interest. To test our approach, we use a Blocksworld-like object manipulation scenario. This domain consists of a table with a number of objects on it and a “shelf” (a special region of the table). The robot can view the objects in the world but, initially, does not have any knowledge about those objects. Instead, world knowledge must be provided by the vision system, the robot’s sensors, and the primitive actions built into the robot controller. The robot is given the task of clearing the objects from the table by placing them onto the shelf. The shelf has limited space so the objects must be stacked in order to successfully complete the task. For simplicity, each object has a radius which provides an estimate of its size. An object A can be stacked into an object B provided the radius of A is less than that of B, and B is “open.” Unlike standard Blocksworld, the robot does not have complete information about the state of the world. Instead, we consider scenarios where the robot does not know whether an object is open or not and must perform a test to determine an object’s “openness”. The robot also has a choice of four different grasping types for manipulating objects in the world. Not all grasp types can be used on every object, and certain grasp types are further restricted by the position of an object relative to other objects in the world. Finally, actions can fail during execution and the robot’s sensors may return noisy data. Basic Representations and OACs At the robot/vision level, the system has a set Σ of sensors, Σ = {σ1, σ2, . . . , σn}, where each sensor σi returns an observation obs(σi) about some feature of the world, represented as a real-valued vector. The execution of a robot-level action, called a motor program, may cause changes to the world which can be observed through subsequent sensing. Each motor program is typically executed with respect to particular objects in the world. We assume that initially the robot/vision system does not know about any objects and, therefore, can’t execute many motor programs. Instead, the robot has a set of object-independent basic reflex actions which it can use in conjunction with the vision system for early exploration and object discovery. At the planning level, the underlying representation is based on a set of fluents, f1, f2, . . . , fm: first-order predicates and functions that denote particular qualities of the world, robot, and objects. Fluents typically represent high-level versions of some of the world-level properties the robot is capable of sensing, where the value of a fluent is a function Γi of a set of observations returned by the sensor set, i.e., fi = Γi(Σ). However, in general, not every sensor need map to some fluent, and we allow for the possibility of fluents with no direct mapping to robot-level sensors. Fluents may be parametrized and instantiated by highlevel counterparts of the objects discovered at the robot level. In particular, for each robot-level object objr we denote a corresponding high-level object by objp. A state is a snapshot of the values of all instantiated fluents at some point during the execution of the system, i.e., { f1, f2, . . . , fm}. States represent an intersection between the low-level and high-level representations and are induced from the sensor observations (the Γi functions) and the object set. The planning level representation also includes a set of high-level actions, α1, α2, . . . , αp, which are viewed as abstract versions of some of the robot’s motor programs. Since all actions must ultimately be executed by the robot, each action is decomposable to a fixed set of motor programs Π(αi), where Π(αi) = {mp1,mp2, . . . ,mpl}, and each mp j is a motor program. As with fluents, not every robot-level motor program need map to a high-level action. Although the robot/vision and planning levels use quite different representations (i.e., real-valued vectors versus logical fluents), the notions of “action” and “state change” are common among these components. To capture these similarities, we model our actions and motor programs using a structure called an Object-Action Complex (OAC) (Geib et al. 2006; Krüger et al. 2009). Formally, an OAC is a tuple 〈 I,T S ,M 〉 , where I is an identifier label for the OAC, T : S → S is a transition function over a state space S , and M is a statistic measure of the accuracy of the transition. OACs provide a universal “container” for encapsulating the relationship between actions (operating over objects) and the changes they make to their state spaces. Each OAC also has an identical set of predefined operations (e.g., composition, update, etc.), providing a common interface to these structures. Since robot systems may have many components, OACs are meant to provide a standard language for describing action-like processes (including continuous processes) within th

[1]  Fahiem Bacchus,et al.  A Knowledge-Based Approach to Planning with Incomplete Information and Sensing , 2002, AIPS.

[2]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[3]  Nicolas Pugeault,et al.  Early cognitive vision: feedback mechanisms for the disambiguation of early visual representation , 2008 .

[4]  Justus H. Piater,et al.  A Probabilistic Framework for 3D Visual Object Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Danica Kragic,et al.  Birth of the Object: Detection of Objectness and Extraction of Object Shape through Object-Action complexes , 2008, Int. J. Humanoid Robotics.

[6]  Mark Steedman,et al.  Using Kernel Perceptrons to Learn Action Effects for Planning , 2008 .

[7]  Edwin P. D. Pednault,et al.  ADL: Exploring the Middle Ground Between STRIPS and the Situation Calculus , 1989, KR.

[8]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[9]  Markus Lappe,et al.  Biologically Motivated Multi-modal Processing of Visual Primitives , 2003 .

[10]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[11]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12]  Christopher W. Geib,et al.  Title of the Deliverable: Publication about Multi-level Learning Sys- Tem Attachment 1 Attachment 2 a Formal Definition of Object-action Complexes and Examples at Different Levels of the Processing Hierarchy , 2022 .

[13]  Danica Kragic,et al.  Early reactive grasping with second order 3D feature relations , 2007 .

[14]  Dirk Kraft,et al.  A Hierarchical 3D Circle Detection Algorithm Applied in a Grasping Scenario , 2009, VISAPP.

[15]  Mark Steedman,et al.  Learning action effects in partially observable domains , 2010, ECAI.

[16]  Fahiem Bacchus,et al.  Extending the Knowledge-Based Approach to Planning with Incomplete Information and Sensing , 2004, ICAPS.

[17]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[18]  Christopher W. Geib,et al.  Object Action Complexes as an Interface for Planning and Robot Control , 2006 .

[19]  Ronald P. A. Petrick,et al.  Planning Dialog Actions , 2007, SIGDIAL.

[20]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .