A framework for modeling and prototyping of real-time dependable systems

Due to the increasing use of computer systems in critical and complex applications, e.g. process control, air flight monitoring, medical care systems and transport systems, the issues related to reliability, safety and fault-tolerance have gained importance in the past decade. For complex hardware systems there are very well defined methodologies to address these issues (130). For complex software systems these techniques can not be easily applied because it is difficult to correct mistakes of programmers automatically (1). Also, it is very difficult to preview all possible changes that can occur in the environment interacting with the software. Even considering that the design is correct, according to the system specification, these situations must be considered and allayed as much as possible. Another important aspect is that errors can occur due to erroneous data from the environment, undetected hardware failures, and/or undetected design errors in the hardware/software components. The focus of this dissertation is the design and development of a methodology for applications with requirements including hard real-time constraints, distributed operation, and fault tolerance. This is accomplished through extending models of computation to take into consideration the encapsulation of both fault tolerance and temporal constraints. In order to assure the correctness of temporal behavior of a hard real-time application, our approach is based on an extended Petri Net model, called the G-Net, enhanced with a deterministic timing scheme. In Timed G-Net (TGN) model, our modeling strategy is to specify the timing constraint of a specific requesting method (operation) by reserved-time. The basic idea is to use basic building blocks with timing properties to allow objects to treat a timing error by raising an exception and triggering some corrective actions. In the realm of fault tolerance, we further refine the TGN model into Smart Object (SMO) model. A smart object is an object with an associated knowledge structure, which supports the decision-making process on what actions to take based on the internal parameters, external interaction with other objects, and the environmental changes. Our approach attaches monitors and error handlers to each SMO, uses redundancy for fault masking, and defines a uniform system-level fault recovery scheme. We also define a hierarchy of error handlers and monitors that can be replicated. In addition, we have developed a technique to estimate the application reliability, in terms of the inputs to the application. This method is applicable to a large class of applications that can be generated recursively.