Grid Data Farm for Petascale Data Intensive Computing

High performance and data-intensive computing and networking technology has become a vital part of large-scale scientific research projects in areas such as high energy physics, astronomy, space exploration and human genome projects. One example is the Large Hadron Collider (LHC) project at CERN, where four major experiment groups will generate an order of Petabyte of raw data from four big underground particle detectors each year, data acquisition starting from 2006. Grid technology will play an essential role in constructing worldwide data analysis environments where thousands of physicists will collaborate and compete for the particle physics data analysis at the energy frontier. A multi-tier “Regional Centers” world-wide computing model has been studied by the MONARC Project[1]. It consists of Tier-0 center at CERN, multiple Tier-1 centers in participating continents, tens of Tier-2 centers in participating countries, and many Tier-3 centers in universities and institutes. Grid Data Farm is a Petascale data-intensive computing project initiated in Japan. The project is collaboration among KEK (High Energy Accelerator Research Organization), ETL/TACC (Electrotechnical Laboratory / Tsukuba Advanced Computing Center), the University of Tokyo, and Tokyo Institute of Technology (Titech). The challenge will involve construction of a data processing framework that will handle hundreds of Terabyte to Petabyte scale data emanated by the ATLAS experiment of LHC. Both KEK and the Univ. of Tokyo will collaborate for building a Tier-1 regional center in Japan. The underlying hardware will be a thousands node scale PC cluster, each node facilitating a near-Terabyte of storage, and incoming data of approximately continuous 600Mbps bandwidth from CERN will be systematically stored and will be subject to intensive processing. The Grid Data Farm will facilitate the following features for collider data processing as well as serving as a framework for other types of data-intensive scientific applications:

[1]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[2]  John Linn,et al.  Generic Security Service Application Program Interface Version 2, Update 1 , 2000, RFC.

[3]  Mitsuhisa Sato,et al.  Design Issues of Network Enabled Server Systems for the Grid , 2000, GRID.

[4]  Mitsuhisa Sato,et al.  Design and implementations of Ninf: towards a global computing infrastructure , 1999, Future Gener. Comput. Syst..

[5]  S. Sekiguchi,et al.  World-wide computing infrastructure: global and local partnership , 1997, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis.