An Approach to Manage Reconfiguration in Fault – Tolerant Distributed Systems

This paper deals with dynamic resource management for real–time dependability–critical distributed systems. Requirements for such kind of systems span many domains such as time, survivability, and scalability and point out formidable challenges in terms of their fulfillment. An architecture is proposed, based on the agent distributed infrastructure Lira, and enriched with statistical models for decision-making capabilities. The aim of the proposed architecture is to provide adaptive system reconfiguration, resorting to a hierarchy of resource managers to cope with fault tolerance and scalability issues.

[1]  Jeff Magee,et al.  Analysing dynamic change in software architectures: a case study , 1998, Proceedings. Fourth International Conference on Configurable Distributed Systems (Cat. No.98EX159).

[2]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[3]  I MarcoCastaldi,et al.  Supporting Component-based Development by Enriching the Traditional API , 2002 .

[4]  Andrea Bondavalli,et al.  Tuning of Database Audits to Improve Scheduled Maintenance in Communication Systems , 2001, SAFECOMP.

[5]  Jeff Magee Configuration of distributed systems , 1994 .

[6]  Andrea Bondavalli,et al.  DEEM: a tool for the dependability modeling and evaluation of multiple phased systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[7]  Marshall T. Rose The Simple Book: An Introduction to Networking Management: Revised Second Edition , 1995 .

[8]  Jeff Magee,et al.  A flexible approach to evolution of reconfigurable systems , 1992, CDS.

[9]  Marco Ajmone Marsan,et al.  On Petri nets with deterministic and exponentially distributed firing times , 1986, European Workshop on Applications and Theory of Petri Nets.

[10]  Paola Inverardi,et al.  A Lightweight Infrastructure for Reconfiguring Applications , 2003, SCM.

[11]  Michel Wermelinger A hierarchic architecture model for dynamic reconfiguration , 1997, Proceedings of PDSE '97: 2nd International Workshop on Software Engineering for Parallel and Distributed Systems.