Enabling HPC E-Science via Integrated Grid Infrastructure

High Performance Computing E-Science has numerous requirements well in excess of the normal environment sufficient for routine computations. The data requirements in particular may be in the multiterabyte regime, with transfer rates in the several Gb/s or more range. Such data capabilities may only be available at a single location. On the other hand, understanding the data, via data-mining and visualization may require completely different facilities, while the actual data production may be only possible at yet a third location. The use of Global File System with multi-Gb/s speeds and hundreds of TeraBytes capacity can help the scientific researchers, but the simultaneous utilization of several systems also requires a reasonably sophisticated co-scheduling capability. In this paper, we show how the TeraGrid is combining massive computational clusters, a Global File System, very powerful visualization systems, and a co-scheduler to enable massive E-Science in a very coordinated and usable manner.

[1]  Phil Andrews,et al.  A centralized data access model for grid computing , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[2]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[3]  Phil Andrews,et al.  Scaling a global file system to the greatest possible extent, performance, capacity, and number of users , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[4]  Phil Andrews,et al.  High-Performance Grid Computing via Distributed Data Access , 2004, PDPTA.

[5]  Phil Andrews,et al.  Exploring the hyper-grid idea with grand challenge applications: the DEISA-TeraGrid interoperability demonstration , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.