Grid computing: the European Data Grid Project

The goal of this project is the development of a novel environment to support globally distributed scientific exploration involving multi-PetaByte datasets. The project will devise and develop middleware solutions and testbeds capable of scaling to handle many PetaBytes of distributed data, tens of thousands of resources (processors, disks, etc.), and thousands of simultaneous users. The scale of the problem and the distribution of the resources and user community preclude straightforward replication of the data at different sites, while the aim of providing a general purpose application environment precludes distributing the data using static policies. We will construct this environment by combining and extending newly emerging "Grid" technologies to manage large distributed datasets in addition to computational elements. A consequence of this project will be the emergence of fundamental new modes of scientific exploration, as access to fundamental scientific data is no longer constrained to the producer of that data. While the project focuses on scientific applications such as High Energy Physics, Earth Sciences and Bio-Informatics, issues of sharing data are germane to many applications and thus the project has a potential impact on future industrial and commercial activities.

[1]  Mathilde Romberg The UNICORE architecture: seamless access to distributed resources , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[2]  John F. Karpovich,et al.  Architectural Support for Extensibility and Autonomy in Wide-Area Distributed Object Systems , 1998 .

[3]  Paolo Calafiura,et al.  General-Purpose Parallel Computing in a High-Energy Physics Experiment at CERN , 1996, HPCN Europe.

[4]  M. Lamanna The COMPASS Computing Farm project , 2002 .

[5]  Iosif Legrand,et al.  Models Of Networked Analysis At Regional Centres For Lhc Experiments (monarc), Phase 2 Report, 24th March 2000 , 2000 .

[6]  Jamie Shiers Massive-scale data management using standards-based solutions , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[7]  Arie Shoshani,et al.  Multidimensional indexing and query coordination for tertiary storage management , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[8]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[9]  Patrick Fuhrmann,et al.  EuroStore. Initial design and first results , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[10]  Gordon Lee,et al.  SHIFT: The Scalable Heterogeneous Integrated Facility for HEP computing , 1991 .

[11]  Rajeev Rastogi,et al.  Update propagation protocols for replicated databates , 1999, SIGMOD '99.

[12]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..