Marching Band: Fault-Tolerance with Replicable Message Delivery Order
暂无分享,去创建一个
[1] D. Manivannan,et al. FINE: A Fully Informed aNd Efficient communication-induced checkpointing protocol for distributed systems , 2009, J. Parallel Distributed Comput..
[2] Jeffrey Overbey,et al. A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.
[3] Thomas Hérault,et al. Correlated set coordination in fault tolerant message logging protocols for many‐core clusters , 2013, Concurr. Comput. Pract. Exp..
[4] Robbert van Renesse,et al. Building adaptive systems using ensemble , 1998 .
[5] Arkadiusz Danilecki,et al. Forced Replicable Execution for a Subset of Piecewise Deterministic Applications with Deterministic Message Passing , 2014, 2014 15th International Conference on Parallel and Distributed Computing, Applications and Technologies.
[6] Kenneth P. Birman,et al. Exploiting virtual synchrony in distributed systems , 1987, SOSP '87.
[7] Kenneth P. Birman,et al. A review of experiences with reliable multicast , 1999, Softw. Pract. Exp..
[8] Priya Narasimhan,et al. Static Analysis Meets Distributed Fault-Tolerance: Enabling State-Machine Replication with Nondeterminism , 2006, HotDep.
[9] Luis Ceze,et al. DDOS: taming nondeterminism in distributed systems , 2013, ASPLOS '13.
[10] Franck Cappello,et al. On Communication Determinism in Parallel HPC Applications , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.
[11] D. Manivannan,et al. HOPE: A Hybrid Optimistic checkpointing and selective Pessimistic mEssage logging protocol for large scale distributed systems , 2012, Future Gener. Comput. Syst..
[12] Andrzej Goscinski,et al. A survey and review of the current state of rollback-recovery for cluster systems , 2009 .
[13] Idit Keidar,et al. Group communication specifications: a comprehensive study , 2001, CSUR.
[14] Kenneth P. Briman. A review of experiences with reliable multicast , 1999 .
[15] Shahram Rahimi,et al. Domino-Effect Free Crash Recovery for Concurrent Failures in Cluster Federation , 2008, GPC.
[16] Sam Toueg,et al. Unreliable failure detectors for reliable distributed systems , 1996, JACM.
[17] Marcos K. Aguilera,et al. Efficient atomic broadcast using deterministic merge , 2000, PODC '00.
[18] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.
[19] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[20] Luis Ceze,et al. Deterministic Process Groups in dOS , 2010, OSDI.
[21] Hong Ong,et al. VCCP: A transparent, coordinated checkpointing system for virtualization-based cluster computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[22] David B. Johnson,et al. Sender-Based Message Logging , 1987 .
[23] Franck Cappello,et al. HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[24] Stephen A. Edwards,et al. A Determinizing Compiler , 2009 .
[25] Ion Stoica,et al. ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.