论文信息 - Tolérance aux pannes pour objets actifs asynchrones : modèle, protocole et expérimentations. (Fault tolerance for asynchronous active objects : protocol, model and experiments)

Tolérance aux pannes pour objets actifs asynchrones : modèle, protocole et expérimentations. (Fault tolerance for asynchronous active objects : protocol, model and experiments)

Résumé 197 x TABLE DES MATIÈRES

[1] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[2] Shigeru Chiba,et al. A metaobject protocol for C++ , 1995, OOPSLA.

[3] Franco Zambonelli. On the effectiveness of distributed checkpoint algorithms for domino-free recovery , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[4] V. Garg,et al. Happened Before is the Wrong Model for Potential Causality , 1998 .

[5] David B. Johnson,et al. Efficient transparent optimistic rollback recovery for distributed application programs , 1993, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[6] D. Manivannan,et al. Asynchronous recovery without using vector timestamps , 2002, J. Parallel Distributed Comput..

[7] Franck Cappello,et al. Grid'5000: a large scale, reconfigurable, controlable and monitorable Grid platform , 2005 .

[8] Denis Caromel,et al. Efficient, flexible, and typed group communications in Java , 2002, JGI '02.

[9] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[10] Pierre Sens,et al. The performance of independent checkpointing in distributed systems , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[11] Gerard Tel,et al. Synchronous, asynchronous, and causally ordered communication , 1996, Distributed Computing.

[12] D. Manivannan,et al. Quasi-Synchronous Checkpointing: Models, Characterization, and Classification , 1999, IEEE Trans. Parallel Distributed Syst..

[13] Denis Caromel,et al. Promised Consistency for Rollback Recovery , 2006 .

[14] Denis Caromel,et al. Asynchronous and deterministic objects , 2004, POPL.

[15] Bruno Ciciani,et al. A VP-accordant checkpointing protocol preventing useless checkpoints , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[16] Achour Mostéfaoui,et al. Communication-Induced Determination of Consistent Snapshots , 1999, IEEE Trans. Parallel Distributed Syst..

[17] David B. Johnson,et al. Sender-Based Message Logging , 1987 .

[18] Denis Caromel,et al. ProActive: an integrated platform for programming and running applications on Grids and P2P systems , 2006 .

[19] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[20] Jian Xu,et al. Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..

[21] Denis Caromel,et al. Balancing active objects on a peer to peer infrastructure , 2005, XXV International Conference of the Chilean Computer Science Society (SCCC'05).

[22] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..

[23] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.

[24] F. Mattern. On the Relativistic Structure of Logical Time in Distributed Systems , 2009 .

[25] Carl E. Landwehr,et al. Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[26] Daniel Marques,et al. Recent advances in checkpoint/recovery systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[27] Leslie Lamport,et al. Cheap Paxos , 2004, International Conference on Dependable Systems and Networks, 2004.

[28] W. Kent Fuchs,et al. Progressive retry for software error recovery in distributed systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[29] Denis Caromel,et al. Peer-to-peer for computational grids: mixing clusters and desktop machines , 2007, Parallel Comput..

[30] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[31] Denis Caromel,et al. Un protocole de tolérance aux pannes pour objets actifs non préemptifs , 2005, Tech. Sci. Informatiques.

[32] Sy-Yen Kuo,et al. An Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Mobile IP , 2003, Mob. Networks Appl..

[33] D. Manivannan,et al. A low-overhead recovery technique using quasi-synchronous checkpointing , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[34] Roberto Baldoni,et al. An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems , 1999, IEEE Trans. Parallel Distributed Syst..

[35] James R. Russell,et al. Optimistic failure recovery for very large networks , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[36] Song Jiang,et al. Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[37] Laxmikant V. Kalé,et al. FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[38] Denis Caromel,et al. A Simple Security-Aware MOP for Java , 2001, Reflection.

[39] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[40] Denis Caromel,et al. A Hybrid Message Logging-CIC Protocol for Constrained Checkpointability , 2005, Euro-Par.

[41] John F. Karpovich,et al. Support for extensibility and site autonomy in the Legion grid system object model , 2003, J. Parallel Distributed Comput..

[42] Achour Mostéfaoui,et al. Communication-based prevention of useless checkpoints in distributed computations , 2000, Distributed Computing.

[43] Daniel Marques,et al. Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[44] Cristina V. Lopes,et al. Aspect-oriented programming , 1999, ECOOP Workshops.

[45] N. Vaidya. Distributed Recovery Units: An Approach for Hybrid and Adaptive Distributed Recovery , 1993 .

[46] Achour Mostéfaoui,et al. Characterization of consistent global checkpoints in large-scale distributed systems , 1995, Proceedings of the Fifth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems.

[47] Jean-Charles Fabre,et al. Using Compile-Time Reflection for Objects'State Capture , 1999, Reflection.

[48] Sam Toueg,et al. Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[49] Michel Raynal,et al. Consistency Issues in Distributed Checkpoints , 1999, IEEE Trans. Software Eng..

[50] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[51] Thomas Hérault,et al. Improved message logging versus improved coordinated checkpointing for fault tolerant MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[52] Roy Friedman,et al. Virtual machine based heterogeneous checkpointing , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[53] Jason Maassen,et al. Ibis: a flexible and efficient Java‐based Grid programming environment , 2005, Concurr. Pract. Exp..

[54] Augusto Ciuffoletti,et al. A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[55] Hiroshi Nakamura,et al. Skewed checkpointing for tolerating multi-node failures , 2004, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004..

[56] Divyakant Agrawal,et al. Using message semantics to reduce rollback in optimistic message logging recovery schemes , 1994, 14th International Conference on Distributed Computing Systems.

[57] Achour Mostéfaoui,et al. Preventing useless checkpoints in distributed computations , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[58] Shigeru Chiba,et al. OpenJava: A Class-Based Macro System for Java , 1999, Reflection and Software Engineering.

[59] Yin-Min Wang,et al. Consistent Global checkpoints that Contain a Given Set of Local Chekpoints , 1997, IEEE Trans. Computers.

[60] Sara Bouchenak,et al. Pickling threads state in the Java system , 2000, Proceedings 33rd International Conference on Technology of Object-Oriented Languages and Systems TOOLS 33.

[61] Swaroop Sridhar,et al. A POLL-FREE, LOW-LATENCY APPROACH TO PROCESS STATE CAPTURE / RECOVERY IN HETEROGENEOUS COMPUTING SYSTEMS , 2002 .

[62] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[63] Robert Tappan Morris,et al. Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[64] Nitin H. Vaidya,et al. Staggered Consistent Checkpointing , 1999, IEEE Trans. Parallel Distributed Syst..

[65] Vijay K. Garg,et al. Addressing false causality while detecting predicates in distributed programs , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[66] Christian Delbé. Causal Ordering of Asynchronous Request Services , .

[67] Daniel Marques,et al. C3: A System for Automating Application-Level Checkpointing of MPI Programs , 2003, LCPC.

[68] Lorenzo Alvisi,et al. Causality tracking in causal message-logging protocols , 2002, Distributed Computing.

[69] Denis Caromel,et al. A theory of distributed objects - asynchrony, mobility, groups, components , 2005 .

[70] Harrick M. Vin,et al. The Cost of Recovery in Message Logging Protocols , 2000, IEEE Trans. Knowl. Data Eng..

[71] Wouter Joosen,et al. Portable Support for Transparent Thread Migration in Java , 2000, ASA/MA.

[72] F. Cappello,et al. Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[73] Friedemann Mattern,et al. Virtual Time and Global States of Distributed Systems , 2002 .

[74] Souza dos Santos. Persistent Java , 1996 .

[75] Luís Moura Silva,et al. System-level versus user-defined checkpointing , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[76] Luís Moura Silva,et al. The performance of coordinated and independent checkpointing , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[77] Pierre Sens,et al. DARX - a framework for the fault-tolerant support of agent software , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[78] Denis Caromel,et al. A Fault Tolerance protocol for ASP calculus: Design and Proof , 2004 .

[79] Luís Moura Silva,et al. Using message semantics for fast-output commit in checkpointing-and-rollback recovery , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[80] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.

[81] Michel Raynal,et al. Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems , 2002, IEEE Distributed Syst. Online.

[82] Sacha Krakowiak,et al. Experiences implementing efficient Java thread serialization, mobility and persistence , 2004, Softw. Pract. Exp..

[83] Douglas Thain,et al. Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[84] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[85] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[86] Denis Conan,et al. Tolerance aux fautes par recouvrement arriere dans les systemes informatiques repartis , 1996 .

[87] Jim Waldo,et al. A Note on Distributed Computing , 1996, Mobile Object Systems.

[88] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[89] Lorenzo Alvisi,et al. An analysis of communication induced checkpointing , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[90] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[91] Vijay K. Garg,et al. Debugging distributed programs using controlled re-execution , 2000, PODC '00.

[92] Steven J. Deitz,et al. Compiler support for automatic checkpointing , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.

[93] Anne-Marie Kermarrec,et al. Peer-to-Peer Membership Management for Gossip-Based Protocols , 2003, IEEE Trans. Computers.

[94] A. Prasad Sistla,et al. Efficient distributed recovery using message logging , 1989, PODC '89.

[95] Vijay K. Garg,et al. Optimistic recovery in multi-threaded distributed systems , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[96] Ami Marowka,et al. The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[97] Marvin Theimer,et al. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[98] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .

[99] Lorenzo Alvisi,et al. Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.