Database Support for Data-Driven Scientific Applications in the Grid

In this paper we describe a services oriented software system to provide basic database support for efficient execution of applications that make use of scientific datasets in the Grid. This system supports two core operations: efficient selection of the data of interest from distributed databases and efficient transfer of data from storage nodes to compute nodes for processing. We present its overall architecture and main components and describe preliminary experimental results.

[1]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[2]  Joel H. Saltz,et al.  Optimizing execution of component-based applications using group instances , 2002, Future Gener. Comput. Syst..

[3]  Martin Charles Golumbic,et al.  Instruction Scheduling Across Control Flow , 1993, Sci. Program..

[4]  Jack Dongarra,et al.  Applying NetSolve's network-enabled server , 1998 .

[5]  Alan P. Wood,et al.  Software Reliability from the Customer View , 2003, Computer.

[6]  Joel H. Saltz,et al.  A manual for the CHAOS runtime library , 1995 .

[7]  Geoffrey C. Fox,et al.  Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions , 1995, IEEE Trans. Parallel Distributed Syst..

[8]  Shashi Shekhar,et al.  A similarity graph-based approach to declustering problems and its application towards parallelizing grid files , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[10]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11]  Mitsuhisa Sato,et al.  Ninf: A Network Based Information Library for Global World-Wide Computing Infrastructure , 1997, HPCN Europe.

[12]  Joel H. Saltz,et al.  Distributed processing of very large datasets with DataCutter , 2001, Parallel Comput..

[13]  Bin Jia,et al.  MPI-IO/GPFS, an Optimized Implementation of MPI-IO on Top of GPFS , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[14]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[15]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[16]  Mary W. Hall,et al.  Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[17]  Joel H. Saltz,et al.  Applying the CHAOS/PARTI library to irregular problems in computational chemistry and computational aerodynamics , 1993, Proceedings of Scalable Parallel Libraries Conference.

[18]  Ian T. Foster,et al.  Replica selection in the Globus Data Grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[19]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[20]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[21]  Joel H. Saltz,et al.  Object-Relational Queries into Multidimensional Databases with the Active Data Repository , 1999, Parallel Process. Lett..

[22]  Joel H. Saltz,et al.  Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[23]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[24]  Joel H. Saltz,et al.  DataCutter and A Client Interface for the Storage Resource Broker withDataCutter Services , 2000 .

[25]  Yuichi Nakamura,et al.  Building Web Services with Java: Making Sense of XML, SOAP, WSDL, and UDDI , 2001 .

[26]  Corporate Rice University,et al.  High performance Fortran language specification , 1993, FORF.

[27]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[28]  Vijayshankar Raman,et al.  Data Access and Management Services on Grid , 2002 .

[29]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[30]  Alok N. Choudhary,et al.  High-performance I/O for massively parallel computers: problems and prospects , 1994, Computer.

[31]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[32]  Joel H. Saltz,et al.  Jovian: a framework for optimizing parallel I/O , 1994, Proceedings Scalable Parallel Libraries Conference.

[33]  Remzi H. Arpaci-Dusseau,et al.  Gathering at the Well: Creating Communities for Grid I/O , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[34]  Joel H. Saltz,et al.  Design of a framework for data-intensive wide-area applications , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[35]  Jim Smith,et al.  Distributed Query Processing on the Grid , 2003, Int. J. High Perform. Comput. Appl..

[36]  Emilio L. Zapata,et al.  A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors , 2000, ICS '00.

[37]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[38]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[39]  Hanan Samet,et al.  Data-Parallel Spatial Join Algorithms , 1994, 1994 International Conference on Parallel Processing Vol. 3.

[40]  Joel H. Saltz,et al.  Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..

[41]  Joel H. Saltz,et al.  Interoperability of data parallel runtime libraries , 1997, Proceedings 11th International Parallel Processing Symposium.

[42]  Scott B. Baden,et al.  Efficient Run-Time Support for Irregular Block-Structured Applications , 1998, J. Parallel Distributed Comput..

[43]  David Kotz,et al.  The galley parallel file system , 1997, ICS '96.

[44]  Michael Stonebraker,et al.  Object-Relational DBMSs: Tracking the Next Great Wave , 1998 .

[45]  Doron Rotem,et al.  Multiprocessor Join Scheduling , 1993, IEEE Trans. Knowl. Data Eng..

[46]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[47]  Ian T. Foster,et al.  Grid Services for Distributed System Integration , 2002, Computer.

[48]  Ron Oldfield,et al.  Armada: a parallel file system for computational grids , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[49]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[50]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[51]  Joel H. Saltz,et al.  Scalability Analysis of Declustering Methods for Multidimensional Range Queries , 1998, IEEE Trans. Knowl. Data Eng..

[52]  James R. Larus,et al.  Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.

[53]  Alok N. Choudhary,et al.  DPFS: a distributed parallel file system , 2001, International Conference on Parallel Processing, 2001..

[54]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[55]  G. Allen,et al.  Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[56]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[57]  Rajeev Thakur,et al.  Passion: Optimized I/O for Parallel Applications , 1996, Computer.

[58]  M. Knopp,et al.  Dynamic contrast‐enhanced MRI using Gd‐DTPA: Interindividual variability of the arterial input function and consequences for the assessment of kinetics in tumors , 2001, Magnetic resonance in medicine.

[59]  Ethan Cerami,et al.  Web Services Essentials , 2002 .

[60]  Joel H. Saltz,et al.  Efficient manipulation of large datasets on heterogeneous storage systems , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[61]  Douglas Thain,et al.  The Kangaroo approach to data movement on the Grid , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[62]  Chau-Wen Tseng,et al.  Improving compiler and run-time support for adaptive irregular codes , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[63]  Marianne Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[64]  Joel H. Saltz,et al.  A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines , 1998, LCR.

[65]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[66]  Peter Brezany,et al.  Parallelization of Irregular Codes Including Out-of-Core Data and Index Arrays , 1997, PARCO.

[67]  Joel H. Saltz,et al.  Infrastructure for Building Parallel Database Systems for Multi-Dimensional Data , 1999, IPPS/SPDP.

[68]  Dror G. Feitelson,et al.  Overview of the MPI-IO Parallel I/O Interface , 1996, Input/Output in Parallel and Distributed Computer Systems.

[69]  Joel H. Saltz,et al.  Interoperability of data parallel runtime libraries with meta-chaos , 1996 .

[70]  William H. Bell,et al.  Project Spitfire - Towards Grid Web Service Databases , 2002 .

[71]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[72]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .