Génération automatique de distributions/ordonnancements temps réel, fiables et tolérants aux fautes

Les systemes reactifs sont de plus en plus presents dans de nombreux secteurs d´activite tels que l´automobile, les telecommunications et l´aeronautique. Ces systemes realisent des tâches complexes qui sont souvent critiques. Au vu des consequences catastrophiques que pourrait entrainer une defaillance dans ces systemes, suite a la presence de fautes materielles (processeurs et media de communication), il est essentiel de prendre en compte la tolerance aux fautes dans leur conception. En outre, plusieurs domaines exigent une evaluation quantitative du comportement de ces systemes par rapport a l'occurrence et a l'activation des fautes. Afin de concevoir des systemes surs de fonctionnement, j'ai propose dans cette these trois methodologies de conception basees sur la theorie d'ordonnancement et la redondance active et passive des composants logiciels du systeme. Ces trois methodologies permettent de resoudre le probleme de la generation automatique de distribution et d'ordonnancements temps reel, fiables et tolerants aux fautes. Ce probleme etant NP-difficile, ces trois methodologies sont basees sur des heuristiques de type ordonnancement de liste. Plus particulierement, les deux premieres methodologies traitent le probleme de la tolerance aux fautes materielles des processeurs et des media de communication, respectivement pour des architectures a liaisons point-a-point et des architectures a liaison bus. La troisieme methodologie traite le probleme de l'evaluation quantitative d'une distribution/ordonnancement en terme de fiabilite a l'aide d'une heuristique bi-critere originale. Ces methodologies offrent de bonnes performances sur des graphes d'algorithme et d'architecture generes aleatoirement.

[1]  Paul J. M. Havinga,et al.  Trade-off between traffic overhead and reliability in multipath routing for wireless sensor networks , 2003, 2003 IEEE Wireless Communications and Networking, 2003. WCNC 2003..

[2]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[3]  Yves Sorel,et al.  An algorithm for automatically obtaining distributed and fault-tolerant static schedules , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[4]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[5]  Brian Randell,et al.  Dependability and its threats - A taxonomy , 2004, IFIP Congress Topical Sessions.

[6]  Ellen Zegura,et al.  Generation and Analysis of Random Graphs to Model Internetworks , 1994 .

[7]  C. Siva Ram Murthy,et al.  Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems , 1997, IEEE Trans. Computers.

[8]  C. M. Krishna,et al.  EVALUATING THE RELIABILITY OF DISTRIBUTED REAL � TIME SYSTEMS , 1999 .

[9]  Kenta Hashimoto Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems , 2002 .

[10]  Anindo Banerjea Simulation Study of the Capacity Effects of Dispersity Routing for Fault Tolerant Realtime Channels , 1996, SIGCOMM.

[11]  Hermann Kopetz,et al.  The fault-hypothesis for the time-triggered architecture , 2004, IFIP Congress Topical Sessions.

[12]  Yi He,et al.  Reliability driven task scheduling for heterogeneous systems , 2003 .

[13]  Xiao Qin,et al.  An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems , 2002, Proceedings International Conference on Parallel Processing.

[14]  Algirdas Avizienis,et al.  Design of fault-tolerant computers , 1967, AFIPS '67 (Fall).

[15]  Isabelle Lacaze,et al.  Airbus fly-by-wire - A total approach to dependability , 2004, IFIP Congress Topical Sessions.

[16]  Atakan Dogan,et al.  Optimal and suboptimal reliable scheduling of precedence-constrained tasks in heterogeneous distributed computing , 2000, Proceedings 2000. International Workshop on Parallel Processing.

[17]  Yingfeng Oh,et al.  Scheduling real-time tasks for dependability , 1995 .

[18]  Andreas Steininger,et al.  The design of a fail-silent processing node for the predictable hard real-time system MARS , 1993, Distributed Syst. Eng..

[19]  Parameswaran Ramanathan,et al.  Delivery of time-critical messages using a multiple copy approach , 1992, TOCS.

[20]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[21]  Niraj K. Jha,et al.  Safety and Reliability Driven Task Allocation in Distributed Systems , 1999, IEEE Trans. Parallel Distributed Syst..

[22]  J.-P. Wang,et al.  Task Allocation for Maximizing Reliability of Distributed Computer Systems , 1992, IEEE Trans. Computers.

[23]  Deng-Jyi Chen,et al.  Time-Constrained Distributed Program Reliability Analysis , 1998, J. Inf. Sci. Eng..

[24]  Torres Wilfredo,et al.  Software Fault Tolerance: A Tutorial , 2000 .

[25]  Martin Hiller,et al.  Software Fault-Tolerance Techniques from a Real-Time Systems Point of View - an overview , 1998 .

[26]  B. Parhami Voting algorithms , 1994 .

[27]  Rachid Guerraoui,et al.  Fault-Tolerance by Replication in Distributed Systems , 1996, Ada-Europe.

[28]  Robert A. Walker,et al.  High-Level Synthesis: Introduction to the Scheduling Problem , 1995 .

[29]  J. Davenport Editor , 1960 .

[30]  Ying C. Yeh Unique dependability issues for commercial airplane fly by wire systems , 2004, IFIP Congress Topical Sessions.

[31]  Xiao Qin,et al.  Dynamic, reliability-driven scheduling of parallel real-time jobs in heterogeneous systems , 2001, International Conference on Parallel Processing, 2001..

[32]  Kang G. Shin,et al.  Fast restoration of real-time communication service from component failures in multi-hop networks , 1997, SIGCOMM '97.

[33]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[34]  Pascal Sainrat,et al.  Calcul de majorants de pire temps d'exécution : état de l'art , 2003, Tech. Sci. Informatiques.

[35]  Hector Garcia-Molina,et al.  Aggressive Transmissions of Short Messages Over Redundant Paths , 1994, IEEE Trans. Parallel Distributed Syst..

[36]  Y. Sorel,et al.  A scheduling heuristics for distributed real-time embedded systems tolerant to processor and communication media failures , 2004 .

[37]  Yves Sorel,et al.  Off-line real-time fault-tolerant scheduling , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.

[38]  Keith Marzullo,et al.  Making real-time reactive systems reliable , 1990, EW 4.

[39]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[40]  Selim G. Akl,et al.  Fault tolerant communication algorithms on the star network using disjoint paths , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[41]  Neeraj Suri,et al.  Editorial: Special Section on Dependable Real-Time Systems , 1999, IEEE Trans. Parallel Distributed Syst..

[42]  Patrick Th. Eugster,et al.  Replicating CORBA objects: a marriage between active and passive replication , 1999, DAIS.

[43]  Dharma P. Agrawal,et al.  Reliability Driven, Non-preemptive Real Time Scheduling on Heterogeneous Systems , 2002, IASTED PDCS.

[44]  Deng-Jyi Chen,et al.  cient algorithms for reliability analysis of distributed computing systems , 1999 .

[45]  John Rushby,et al.  Critical system properties: survey and taxonomy , 1994 .

[46]  Alain Girault,et al.  A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints , 2004, International Conference on Dependable Systems and Networks, 2004.

[47]  Afonso Ferreira,et al.  On the Real Power of Loosely Coupled Parallel Architectures , 1991, Parallel Process. Lett..

[48]  Sam Toueg,et al.  Resilient consensus protocols , 1983, PODC '83.

[49]  Sorel,et al.  02 - Modèle unifié pour la conception conjointe logiciel-matériel , 1997 .

[50]  Jean-Pierre Beauvais Etude d'algorithmes de placement de taches temps reel complexes dans un systeme reparti , 1996 .

[51]  Yves Sorel,et al.  Fault-tolerant static scheduling for real-time distributed embedded systems , 2001, Proceedings 21st International Conference on Distributed Computing Systems.

[52]  T. Grandpierre,et al.  Modelisation d'architectures paralleles heterogenes pour la generation automatique d'executifs distribues temps reel optimises , 2000 .

[53]  Yves Sorel,et al.  An Active Replication Scheme that Tolerates Failures in Distributed Embedded Real-Time Systems , 2004, DIPES.

[54]  Ishfaq Ahmad,et al.  On Exploiting Task Duplication in Parallel Program Scheduling , 1998, IEEE Trans. Parallel Distributed Syst..

[55]  Algirdas Avizienis Dependable systems of the future - What is still needed? , 2004, IFIP Congress Topical Sessions.

[56]  Ishfaq Ahmad,et al.  Performance Comparison of Algorithms for Static Scheduling of DAGs to Multiprocessors1 , 1998 .

[57]  C. Siva Ram Murthy,et al.  Improved task-allocation algorithms to maximize reliability of redundant distributed computing systems , 1995 .

[58]  Nagarajan Kandasamy,et al.  Dependable communication synthesis for distributed embedded systems , 2003, Reliab. Eng. Syst. Saf..

[59]  Y. Sorel,et al.  Massively parallel computing systems with real time constraints: the "Algorithm Architecture Adequation" methodology , 1994, Proceedings of the First International Conference on Massively Parallel Computing Systems (MPCS) The Challenges of General-Purpose and Special-Purpose Computing.

[60]  Deng-Jyi Chen,et al.  The Reliability Analysis of Distributed Computing Systems with Imperfect Nodes , 1999, Comput. J..

[61]  Annie Vicard,et al.  Formalisation et Optimisation des Systmes In-formatiques Distribus Temps-Rel Embarqus , 1999 .

[62]  Xiao Qin,et al.  Real-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems , 2000 .

[63]  Yves Sorel,et al.  Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).

[64]  Cristian Constantinescu,et al.  Impact of deep submicron technology on dependability of VLSI circuits , 2002, Proceedings International Conference on Dependable Systems and Networks.

[65]  Michel Sorine,et al.  The SynDEx software environment for real-time distributed systems design and implementation , 1991 .

[66]  Isabelle Puaut,et al.  Scheduling fault-tolerant distributed hard real-time tasks independently of the replication strategies , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).

[67]  Jonathan Rose,et al.  Characterization and parameterized random generation of digital circuits , 1996, DAC '96.

[68]  David Powell Failure mode assumptions and assumption coverage , 1992 .

[69]  Kang G. Shin,et al.  Fault-tolerant real-time communication in distributed computing systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[70]  Pankaj Jalote,et al.  Fault tolerance in distributed systems , 1994 .