RT-GRID: a real-time, fault-tolerant grid model

Computational grids bring together various distributed resources belonging to different organizations, and provide a high-performance computing environment. However, there is another application category, with the requirement of time constraints and fault tolerance, which needs to be processed in such a high-performance platform. It is necessary to incorporate real-time, fault-tolerant properties into computational grid. We propose a real-time, fault-tolerant grid model, called RT-GRID. We discuss the differences of core components between RT-GRID and computational grid. For such issues arose from real-time and fault tolerance, we study it at grid system level, and propose a monolithic architecture.

[1]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[2]  Francine Berman,et al.  Toward a framework for preparing and executing adaptive grid programs , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[3]  A. Orda,et al.  QoS routing mechanisms and OSPF extensions , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[4]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[5]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[6]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[7]  Jennifer M. Schopf,et al.  Predicting sporadic grid data transfers , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[8]  Ian Foster,et al.  A quality of service architecture that combines resource reservation and application adaptation , 2000, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400).

[9]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[10]  Ian T. Foster,et al.  Predicting the performance of wide area data transfers , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[11]  Alexandre Vaniachine,et al.  Grid—Enabled Data Access in the ATLAS Athena Framework , 2001 .

[12]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.