HTTP redirection for replica catalogue lookups in data grids

Data distribution and replication in distributed systems require special purpose middleware tools for accessing replicated data. Data Grids, special forms of systems distributed over wide-area networks, need to handle data management issues like distribution and replication of large amounts of data in the Tera- and Petabyte scale. Replica catalogues are used for cataloguing and locating replicated files in distributed sites all around the globe. We present a novel and administratively scalable approach for distributing a replica catalogue and resolving file location information by using HTTP redirection. HTTP redirection servers managing local file catalogues allow for greater flexibility and local file management autonomy whereas a global replica catalogue provides the necessary mapping of logical files to individual sites. By distributing the catalogues a site can autonomously move files for load balancing within a site without notifying a global replica catalogue. Our approach scales well in terms of catalogue administration to a large number of sites and file entries and thus establishes a powerful middleware service. We present the design and implementation of our catalogue redirection servers and report on promising experimental results.

[1]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[2]  Ian T. Foster,et al.  Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[3]  Heinz Stockinger Distributed Database Management Systems and the Data Grid , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.

[4]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[5]  Tim Howes,et al.  Lightweight Directory Access Protocol , 1995, RFC.

[6]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[7]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[8]  Patricia G. Selinger,et al.  A distributed catalog for heterogeneous distributed database resources , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[9]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[10]  Andrew Hanushevsky,et al.  Pursuit of a scalable high performance multi-petabyte database , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[11]  Jim Zelenka,et al.  File server scaling with network-attached secure disks , 1997, SIGMETRICS '97.

[12]  Heinz Stockinger,et al.  Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication , 2001 .

[13]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[14]  David Hung-Chang Du,et al.  Active Disk File System : A Distributed, Scalable File System , 2001, 2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies.