GDIA: A Scalable Grid Infrastructure for Data Intensive Applications

The applications in many scientific fields, like bioinformatics and high-energy physics etc, increasingly demand the computing infrastructures can provide more computing power and support huger amount of distributed data. GDIA, built on top of CSF4 meta-scheduler and Gfarm data grid, is a scalable grid infrastructure for data intensive applications. In this paper, we presented the architecture of GDIA and the new enhancements to CSF4 in GDIA. First, a flexible user proxy delegation mechanism was introduced to enable a job running with a full proxy. With the enhancement, the jobs can access the grid services with strict security requirement like Gfarm. Secondly, we redesigned CSF4's resource manager service to support alternative protocols other than WS GRAM. At last, we discussed the scheduling issues in grids. GDIA is able to coordinate heterogeneous clusters belonging to different VOs via centralized or decentralized model. In current, GDIA has been deployed on PRA GMA's grid test bed successfully to schedule data intensive applications

[1]  Ian T. Foster,et al.  State and events for Web services: a comparison of five WS-resource framework and WS-notification implementations , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[2]  Miron Livny,et al.  Managing network resources in Condor , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[3]  Liang Hu,et al.  Integrating Local Job Scheduler - LSFTM with GfarmTM , 2005, ISPA.

[4]  Satoshi Matsuoka,et al.  Grid Datafarm Architecture for Petascale Data Intensive Computing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[5]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[6]  Xiaohui Wei,et al.  CSF4: A WSRF Compliant Meta-Scheduler , 2006, GCA.

[7]  Gregor von Laszewski,et al.  Commodity Grid Kits - Middleware for Building Grid Computing Environments , 2003 .

[8]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[9]  Greg B. Quinn,et al.  A comparative proteomics resource: proteins of Arabidopsis thaliana , 2003, Genome Biology.

[10]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.