The COLLAGE/KHOROS Link: Planning for Image Processing Tasks

1 Data Analysis Planning This paper describes the application of the COLLAGE planner to the task of generating image processing plans for satellite remote sensing data. In particular, we focus on the linkage of COLLAGE to the KHOROS image processing system. Several obvious requirements presented themselves when we first confronted integrating COLLAGE and KHOROS: low-level connection tasks; representation translation tasks; the need to present users with a suitably coherent combined architecture. However, one overarching and pervasive issue became clear over time: how to represent and partition information in a way that fosters extensibUity and flexibility. This is necessary for at least two reasons. First, KHOROS is an "open" system its suite of image processing algorithms is constantly changing. Second, our combined architecture must be useable by a variety of users with different skill levels. These kinds of issues, of course, are common to many software engineering enterprises. Our experience with COLLAGE indicates that planning systems will also have to cope with them when they are used within operational environments. The goal of this work is to apply domain independent planning methods to help scientists plan out their daily data analysis tasks. We are particularly interested in aiding Earth system scientists who study Earth’s ecosystems using a mixture of remotely sensed data (satellite imagery) and ground-based data sets (e.g., vegetation studies, soil maps, etc.). Although these scientists are most interested in developing theories or models, they usually find themselves spending the bulk of their time puzzling over low-level data selection and manipulation tasks. Such tasks make up the "busy work" of their science. In the era of EOS (NASA’s Earth Oberving System a suite of satellites slated for launch in the next decade), scientists will have more data at their fingertips than ever before an expected 1.2 terabytes/day. Fundamental innovations are required to keep the relatively small Earth science community from becoming swamped in the deluge. For example, conducting even a relatively small study using one or two images can take weeks of a scientist’s time. They may have to utilize two or three image processing or geographic information systems, each with its own set of algorithms, formatting requirements, and idiosyncracies regarding 67 From: AAAI Technical Report SS-95-04. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. parameter usage. More often than not, each of these systems is resident on a different machine. To compound the problem further, a scientist must typically access several distinct databases to find the data they require. Interestingly, similar problems are confronted by users of other types of softwaree.g. graphic artists or users of other complex software tool kits. The heterogeneity and scope of such systems can create a logistical nightmare for their users. Although they may understand what they want to accomplish, users are awash in a sea of possible data, tools, and software routines. In the data analysis domain, scientists who can afford it hire technicians who specialize in data preparation. If they cannot, they muddle through, often using methods they are most familiar with rather than the ones that are most appropriate for their task. In the context of software engineering and product design for such systems, there have been increasing efforts to create more integrated desk top environments to solve some of these problems. Indeed, the KHOROS image processing system we are working with is representative of one such effort [13]. Available for free over the Internet, KHOROS fosters an object-oriented approach to image processing. Users make use of and can augment a variety of toolboxes containing image processing algorithms. Algorithms can be selected from these toolboxes and combined to create visual image processing data flow diagrams (plans) using a GUI editor called Cantata. However, even with these tools, the expertise required to create such plans is substantial. In the AI planning community, there have been growing efforts to automate parts of the data analysis process. For instance, several researchers in planning have begun to study how data access plans can be generated to aid users in finding the information they need [5, 16]. In contrast, our work focuses on aiding scientists in their use of the image processing and geographic information systems that sit on their desk. That is, given a high level task description, the goal of our application is to decompose it into a partially ordered set of steps corresponding to transformation algorithms executable on a particular platform. Other planning work in this vein is being done by Short [4, 15], Chien [3], Matwin [11], and Boddy [2]. The role of our planner can be viewed as mush like that of an logistical assistant or technician [8]. While a data analysis planner does not require deep knowledge about a particular scientific discipline, it can be usefully embued with information about: the steps making up typical data processing tasks; the available algorithms on various platforms; and what the requirements of these algorithms are their parameter settings, their applicability to various data types, etc. Interestingly, this is also the kind of mundane (yet volatile) information that a scientist would rather not deal with. The net effect is that the planner fills a role that is desired and valued, which also increases the likelihood of its eventual acceptance and use. Of course, to be truly useful, a data analysis planner cannot sit in a vacuum. Ideally, it should be connected to the platforms on which the algorithms will be executed; the plan can be downloaded into the input format of a particular platform and executed there. A data analysis planner should also be connected to a framework that expedites data selection. Given an integrated planning architecture of this kind, a variety of issues must be reckoned with: utility (i.e. breadth and depth capability); ease use; and openness to the natural evolution of component systems. The rest of this paper describes our experiences in building an architecture of this kind. Section 2 describes the overall framework and problems we have faced. Section 3 focuses on some of the larger issues that underly these problems. 2 COLLAGE/KHOROS Link: Core Issues Figure 1 depicts the overall architecture of our integrated data analysis framework. In collaboration with a team from NASA’s Goddard Space Flight Center, we are integrating the COLLAGE planner into Goddard’s IIFS framework (the Intelligent Information Fusion System) an object oriented framework for ingesting and storing remotely sensed data, generating derived products and information about that data, and aiding the user in data selection [14] (see Figure 2). Scientists using the IIFS can utilize the KHOROS image processing system to create desired data products. COLLAGE serves as a front-end to this framework to plan out exactly what KHOROS algorithms should be used to achieve a particular task. Ultimately, it is also our intention to link other image processing and geographic information sys-