Artemis: Integrating Scientific Data on the Grid

Grid technologies provide a robust infrastructure for distributed computing, and are widely used in large-scale scientific applications that generate terabytes (soon petabytes) of data. This data is described with metadata attributes about the data properties and provenance, and is organized in a variety of metadata catalogs distributed over the grid. In order to find a collection of data that share certain properties, these metadata catalogs need to be identified and queried on an individual basis. This paper introduces Artemis, a system developed to integrate distributed metadata catalogs on the grid. Artemis exploits several AI techniques including a query mediator, a query planning and execution system, ontologies and semantic web tools to model metadata attributes, and an intelligent user interface that guides users through these ontologies to formulate queries. We describe our experiences using Artemis with large metadata catalogs from two projects in the physics domain.

[1]  Reagan Moore Data Management Systems for Scientific Applications , 2000, The Architecture of Scientific Software.

[2]  Craig A. Knoblock,et al.  Agent wizard: building information agents by answering questions , 2004, IUI '04.

[3]  Yolanda Gil,et al.  Transparent Grid Computing: A Knowledge-Based Approach , 2003, IAAI.

[4]  James Annis et al. Applying chimera virtual data concepts to cluster finding in the Sloan Sky Survey , 2002 .

[5]  Craig A. Knoblock,et al.  A View Integration Approach to Dynamic Composition of Web Services , 2003 .

[6]  Bertram Ludäscher,et al.  A Model-Based Mediator System for Scientific Data Management , 2003, Bioinformatics.

[7]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[8]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[9]  James A. Hendler,et al.  Guest Editors' Introduction: E-Science , 2004, IEEE Intelligent Systems.

[10]  Adam Arbree,et al.  Mapping Abstract Complex Workflows onto Grid Environments , 2003, Journal of Grid Computing.

[11]  Yolanda Gil,et al.  Artificial intelligence and grids: workflow planning and beyond , 2004, IEEE Intelligent Systems.

[12]  Jerry R. Hobbs,et al.  Time in OWL-S , 2004 .

[13]  Craig A. Knoblock,et al.  Efficient Execution of Recursive Integration Plans , 2003, IIWeb.

[14]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[15]  Craig A. Knoblock,et al.  An Expressive and Efficient Language for Information Gather ing on the Web , 2002 .

[16]  Carole A. Goble,et al.  A Suite of Daml+Oil Ontologies to Describe Bioinformatics Web Services and Data , 2003, Int. J. Cooperative Inf. Syst..

[17]  Carl Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, SC.

[18]  Yolanda Gil,et al.  Pegasus: Mapping Scientific Workflows onto the Grid , 2004, European Across Grids Conference.

[19]  Ronald F. Boisvert,et al.  The Architecture of Scientific Software , 2001, IFIP — The International Federation for Information Processing.

[20]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[21]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[22]  Alon Y. Levy Logic-based techniques in data integration , 2001 .