Personal Workspace for Large-Scale Data-Driven Computational Experiment

As the scale and complexity of data-driven computational science grows, so grows the burden on the scientists and students in managing the data products used and generated during experiments. Products must be moved and directories created. Search support in traditional file systems is arcane. While storage management tools can store rich metadata, these tools do not satisfy the nuances of the individual computational science researcher working alone or cooperatively. We have developed a personal workspace tool, myLEAD, that actively manages metadata and data products for users. Inspired by the Globus MCS metadata catalog and layered on top of the UK e-Science OGSA-DAI tool, myLEAD provides capture, storage and search tools to the computational scientist. In this paper we experimentally evaluate the performance of the myLEAD metadata catalog

[1]  Gustavo Alonso,et al.  Scientific data repositories: designing for a moving target , 2003, SIGMOD '03.

[2]  Carole A. Goble,et al.  myGrid: personalised bioinformatics on the information grid , 2003, ISMB.

[3]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[4]  Tony Pan,et al.  Grid-based management of biomedical data using an XML-based distributed data management system , 2005, SAC '05.

[5]  James Gallagher,et al.  OPeNDAP: Accessing data in a distributed, heterogeneous environment , 2003, Data Sci. J..

[6]  Sangmi Lee Pallickara,et al.  A hybrid XML-relational grid metadata catalog , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).

[7]  Dennis Gannon,et al.  Active management of scientific data , 2005, IEEE Internet Computing.

[8]  Peter Li The myGrid information model , 2004 .

[9]  Peter H Beckman,et al.  Building the TeraGrid , 2005, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[10]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[11]  C. Goble,et al.  The {my}Grid Project: Services, Architecture and Demonstrator , 2003 .

[12]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[13]  Yogesh L. Simmhan,et al.  Building Grid Portal Applications From a Web Service Component Architecture , 2005, Proceedings of the IEEE.

[14]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[15]  Reagan Moore,et al.  MySRB & SRB: Components of a Data Grid , 2002 .

[16]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[17]  Yi Huang,et al.  Cooperating services for data-driven computational experimentation , 2005, Computing in Science & Engineering.

[18]  C. Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[19]  John Pearson,et al.  Experiment Management with Metadata-based Integration for Collaborative Scientific Research , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[20]  Carl Kesselman,et al.  Wide area data replication for scientific collaborations , 2005, Int. J. High Perform. Comput. Netw..

[21]  Ibm Redbooks,et al.  Gpfs a Parallel File System , 1998 .

[22]  Yogesh L. Simmhan,et al.  Dynamic, Adaptive Workflows for Mesoscale Meteorology , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[23]  Rahul Ramachandran,et al.  Service-oriented environments for dynamically interacting with mesoscale weather , 2005, Computing in Science & Engineering.

[24]  Pascal Raymond,et al.  The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.