The Earth System Grid: Supporting the Next Generation of Climate Modeling Research

Understanding the Earth's climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards of 100 TB of simulation data and is growing rapidly. Looking toward mid-decade and beyond, we must anticipate and prepare for distributed climate research data holdings of many petabytes. The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment. The problem is fundamentally a Grid problem. Building upon the Globus toolkit and a variety of other technologies, ESG is developing an environment that addresses authentication, authorization for data access, large-scale data transport and management, services and abstractions for high-performance remote data access, mechanisms for scalable data replication, cataloging with rich semantic and syntactic information, data discovery, distributed monitoring, and Web-based portals for using the system.

[1]  W. Collins,et al.  The Community Climate System Model Version 3 (CCSM3) , 2006 .

[2]  Arie Shoshani,et al.  Storage resource managers: essential components for the Grid , 2003 .

[3]  Arie Shoshani,et al.  Data Access, Integration, and Management , 2004, The Grid 2, 2nd Edition.

[4]  Don Middleton Earth System Grid II, Turning Climate Datasets into Community Resources , 2001 .

[5]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  Arie Shoshani,et al.  DataMover: robust terabyte-scale multi-file replication over wide-area networks , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[7]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[8]  Bryan N. Lawrence,et al.  British Atmospheric Data Centre (BADC) , 2004 .

[9]  Wang Jun Open Archives Initiative Protocol for Metadata Harvesting , 2005 .

[10]  Jason Lee,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[11]  Arie Shoshani,et al.  An ontology for scientific information in a Grid environment: the earth system Grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[12]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[13]  Rahul Ramachandran,et al.  LINKED ENVIRONMENTS FOR ATMOSPHERIC DISCOVERY ( LEAD ) : ARCHITECTURE , TECHNOLOGY ROADMAP AND DEPLOYMENT STRATEGY , 2004 .

[14]  B. Domenico Thematic Real-time Environmental Data Distributed Services (THREDDS) , 2001 .

[15]  David E. Bernholdt,et al.  Data Grid discovery and Semantic Web technologies for the earth sciences , 2005, International Journal on Digital Libraries.

[16]  David W. Fulker,et al.  Unidata: A Virtual Community Sharing Resources via Technological Infrastructure , 1997 .

[17]  Paul Avery,et al.  The griphyn project: towards petascale virtual data grids , 2001 .

[18]  Ian T. Foster,et al.  A community authorization service for group collaboration , 2002, Proceedings Third International Workshop on Policies for Distributed Systems and Networks.

[19]  Carl Kesselman,et al.  High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[20]  Ian T. Foster,et al.  A security architecture for computational grids , 1998, CCS '98.

[21]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[22]  Dean N. Williams,et al.  Climate Data Analysis Tools - (CDAT) , 2003 .

[23]  Ian T. Foster,et al.  Security for Grid services , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[24]  Carl Kesselman,et al.  Performance and scalability of a replica location service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[25]  K. Taylor,et al.  The Community Climate System Model , 2001 .

[26]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..