Simulation Model and Instrument to Evaluate Replication Techniques

Fault tolerance in distributed systems relies heavily on some form of replication. Replication can also be used to reduce the access latency and the bandwidth consumption in large scale distributed systems. However, in case of large volumes of data, the replica placing strategy and the consistency algorithms become key factors for the performance of the data replication strategy. We present a simulation model designed to realistically evaluate replication solutions for large scale distributed systems. The model was implemented as an extension of the MONARC simulator. In this context, we present a scalable architecture designed to facilitate the adoption of data replication strategies in large scale distributed systems. The solution combines a hybrid replication model with a proposed fault tolerant strategy for data consistency. We present evaluation results of this strategy using the MONARC simulator.

[1]  Robbert van Renesse,et al.  The Building Blocks of Consensus , 2008, ICDCN.

[2]  Boleslaw K. Szymanski,et al.  Simulation of dynamic data replication strategies in Data Grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[3]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[4]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[5]  Ciprian Dobre Advanced Techniques for Modeling and Simulation of Grid Systems , 2010 .

[6]  C.M. Dobre,et al.  A Simulation Model for Large Scale Distributed Systems , 2007, 2007 Innovations in Information Technologies (IIT).

[7]  Rajkumar Buyya,et al.  A toolkit for modelling and simulating data Grids: an extension to GridSim , 2008, Concurr. Comput. Pract. Exp..

[8]  Péter Urbán,et al.  Performance Comparison Between the Paxos and Chandra-Toueg Consensus Algorithms , 2002 .

[9]  Rajkumar Buyya,et al.  Extending GridSim with an architecture for failure detection , 2007, 2007 International Conference on Parallel and Distributed Systems.

[10]  Marcos K. Aguilera,et al.  On the quality of service of failure detectors , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.