Enabling Distributed Petascale Science

Petascale science is an end-to-end endeavour, involving not only the creation of massive datasets at supercomputers or experimental facilities, but the subsequent analysis of that data by a user community that may be distributed across many laboratories and universities. The new SciDAC Center for Enabling Distributed Petascale Science (CEDPS) is developing tools to support this end-to-end process. These tools include data placement services for the reliable, high-performance, secure, and policy-driven placement of data within a distributed science environment; tools and techniques for the construction, operation, and provisioning of scalable science services; and tools for the detection and diagnosis of failures in end-to-end data placement and distributed application hosting configurations. In each area, we build on a strong base of existing technology and have made useful progress in the first year of the project. For example, we have recently achieved order-of-magnitude improvements in transfer times (for lots of small files) and implemented asynchronous data staging capabilities; demonstrated dynamic deployment of complex application stacks for the STAR experiment; and designed and deployed end-to-end troubleshooting services. We look forward to working with SciDAC application and technology projects to realize the promise of petascale science.

[1]  Brian Tierney,et al.  NetLogger: A Toolkit for Distributed System Performance Tuning and Debugging , 2003, Integrated Network Management.

[2]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Patrick Fuhrmann dCache, the Commodity Cache , 2004, MSST.

[4]  Borja Sotomayor,et al.  Virtual Clusters for Grid Communities , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[5]  L. Evans The Large Hadron Collider Project , 1997 .

[6]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[7]  Brian Tierney,et al.  Log summarization and anomaly detection for troubleshooting distributed systems , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[8]  Ian T. Foster,et al.  GNARE: an environment for grid-based high-throughput genome analysis , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[9]  Borja Sotomayor,et al.  Overhead Matters: A Model for Virtual Resource Management , 2006, First International Workshop on Virtualization Technology in Distributed Computing (VTDC 2006).

[10]  I. Foster,et al.  Terascale Turbulence Computation on BG / L Using the FLASH 3 Code , 2006 .

[11]  Narayan Desai,et al.  BCFG: a configuration management tool for heterogeneous environments , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[12]  I. Foster,et al.  Service-Oriented Science , 2005, Science.

[13]  David E. Bernholdt,et al.  Monitoring the Earth System Grid with MDS4 , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[14]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[15]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[16]  Arie Shoshani,et al.  Storage resource managers: essential components for the Grid , 2003 .

[17]  J. Stillerman,et al.  THE NATIONAL FUSION COLLABORATORY PROJECT: APPLYING GRID TECHNOLOGY FOR MAGNETIC FUSION RESEARCH , 2004 .

[18]  Joel H. Saltz,et al.  Introduce: An Open Source Toolkit for Rapid Development of Strongly Typed Grid Services , 2007, Journal of Grid Computing.

[19]  Miron Livny,et al.  Data placement for scientific applications in distributed environments , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[20]  Kavitha Ranganathan,et al.  Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids , 2003, Journal of Grid Computing.

[21]  Arie Shoshani,et al.  The Earth System Grid: Supporting the Next Generation of Climate Modeling Research , 2005, Proceedings of the IEEE.

[22]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[23]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[24]  Ian T. Foster,et al.  Virtual workspaces: Achieving quality of service and quality of life in the Grid , 2005, Sci. Program..

[25]  Keith R. Jackson pyGlobus: a Python interface to the Globus Toolkit™ , 2002, Concurr. Comput. Pract. Exp..

[26]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[27]  Brian Tierney,et al.  Grid Logging: Best Practices Guide , 2008 .

[28]  Jorge Luis Rodriguez,et al.  The Open Science Grid , 2005 .

[29]  Carl Kesselman,et al.  Wide area data replication for scientific collaborations , 2005, Int. J. High Perform. Comput. Netw..

[30]  Jason Lee,et al.  The Grid2003 production grid: principles and practice , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[31]  Andrea C. Arpaci-Dusseau,et al.  NeST: a Grid enabled storage appliance , 2004 .