A compiler-enabled model- and measurement-driven adaptation environment for dependability and performance

Traditional techniques for building dependable, high-performance distributed systems are too expensive for most non-critical systems, often causing dependability to be sidelined as a design goal. Nevertheless, systems are expected to be dependable, and if dependability could be provided at a lower cost, many applications would stand to benefit. We believe that compiler techniques can be used to create novel and enhance existing dependability mechanisms to create a wider range of cost/dependability tradeoffs than is currently available. Similarly, compilers can assist in the area of error detection by expanding the range of errors that can be detected. New compiler techniques, combined with model-driven adaptation and control mechanisms, can be used to dynamically guide a system as it makes choices, with cost, dependability, and performance tradeoffs, in response to the occurrence of faults and changes in the environment. This paper reports on a new project that is exploring the approach. The broad goal of the work is to create a powerful yet flexible runtime environment for dependable and high-performance systems that operate within much lower cost constraints than is currently possible.

[1]  Gianfranco Ciardo,et al.  Efficient Reachability Set Generation and Storage Using Decision Diagrams , 1999, ICATPN.

[2]  William H. Sanders,et al.  An adaptive framework for tunable consistency and timeliness using replication , 2002, Proceedings International Conference on Dependable Systems and Networks.

[3]  Massoud Pedram,et al.  Dynamic power management based on continuous-time Markov decision processes , 1999, DAC '99.

[4]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[5]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[6]  Rizos Sakellariou,et al.  Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction , 2000, LCPC.

[7]  S AdveVikram,et al.  Parallel program performance prediction using deterministic task graph analysis , 2004 .

[8]  K. Shin,et al.  Performance Guarantees for Web Server End-Systems: A Control-Theoretical Approach , 2002, IEEE Trans. Parallel Distributed Syst..

[9]  William H. Sanders,et al.  The Möbius Framework and Its Implementation , 2002, IEEE Trans. Software Eng..

[10]  Vikram S. Adve,et al.  Program Control Language: a programming language for adaptive distributed applications , 2003, J. Parallel Distributed Comput..

[11]  Klara Nahrstedt,et al.  A control-based middleware framework for quality-of-service adaptations , 1999, IEEE J. Sel. Areas Commun..

[12]  Stephen J. Wright,et al.  Near-optimal adaptive control of a large grid application , 2002, ICS '02.

[13]  Ravishankar K. Iyer,et al.  Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data , 1990, IEEE Trans. Computers.

[14]  Massoud Pedram,et al.  Dynamic power management of complex systems using generalized stochastic Petri nets , 2000, DAC.

[15]  William H. Sanders,et al.  Performance evaluation of a QoS-aware framework for providing tunable consistency and timeliness , 2002, IEEE 2002 Tenth IEEE International Workshop on Quality of Service (Cat. No.02EX564).

[16]  Joseph L. Hellerstein,et al.  Using Control Theory to Achieve Service Level Objectives In Performance Management , 2002, Real-Time Systems.

[17]  Chenyang Lu,et al.  An adaptive control framework for QoS guarantees and its application to differentiated caching , 2002, IEEE 2002 Tenth IEEE International Workshop on Quality of Service (Cat. No.02EX564).

[18]  William H. Sanders,et al.  A dynamic replica selection algorithm for tolerating timing faults , 2001, 2001 International Conference on Dependable Systems and Networks.

[19]  Ravishankar K. Iyer,et al.  An architectural framework for providing reliability and security support , 2004, International Conference on Dependable Systems and Networks, 2004.

[20]  Israel Koren,et al.  Reduced state-space markov decision process and the dynamic recovery and reconfiguration of a distributed real-time system , 1996 .

[21]  Stephen J. Wright,et al.  Model-based control of adaptive applications: an overview , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23]  Edmund M. Clarke,et al.  Symbolic Model Checking: 10^20 States and Beyond , 1990, Inf. Comput..

[24]  Ravishankar K. Iyer,et al.  An Architectural Framework for Detecting Process Hangs/Crashes , 2005, EDCC.

[25]  Rizos Sakellariou,et al.  Improving lookahead in parallel discrete event simulations of large-scale applications using compiler analysis , 2001, Proceedings 15th Workshop on Parallel and Distributed Simulation.

[26]  Luca Benini,et al.  Policy optimization for dynamic power management , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[27]  Mary K. Vernon,et al.  Parallel program performance prediction using deterministic task graph analysis , 2004, TOCS.

[28]  William H. Sanders,et al.  Performance evaluation of a probabilistic replica selection algorithm , 2002, Proceedings of the Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. (WORDS 2002).

[29]  Ravishankar K. Iyer,et al.  Measurement-based Analysis of Networked System Availability , 2000, Performance Evaluation.