Speculative execution in a distributed file system

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through interprocess communication. It guarantees correct execution by preventing speculative processes from externalizing output, for example, sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local area networks and an order of magnitude improvement over wide area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.

[1]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[2]  Tong Li,et al.  Pulse: A Dynamic Deadlock Detection Mechanism Using Speculative Execution , 2005, USENIX Annual Technical Conference, General Track.

[3]  Jason Flinn,et al.  Energy-Efficiency and Storage Flexibility in the Blue File System , 2004, OSDI.

[4]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[5]  Rodrigo Rodrigues,et al.  Transactional file systems can be fast , 2004, EW 11.

[6]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[7]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[8]  Tzi-cker Chiueh,et al.  Design, implementation, and evaluation of repairable file service , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Cristian Tapus,et al.  Kernel level speculative DSM , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[10]  Fay W. Chang,et al.  Operating System I/O Speculation: How Two Invocations Are Faster Than One , 2003, USENIX Annual Technical Conference, General Track.

[11]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[12]  L. Alvisi,et al.  A survey of rollback-recovery protocols in message-passing systems , 2002, CSUR.

[13]  Miguel Castro,et al.  Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.

[14]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  David Mazières,et al.  Separating key management from file system security , 1999, SOSP.

[16]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[17]  Josep Torrellas,et al.  Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[18]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[19]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[20]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[21]  Mahadev Satyanarayanan,et al.  Fundamental challenges in mobile computing , 1996, PODC '96.

[22]  Garret Swart,et al.  The Echo Distributed File System , 1996 .

[23]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[24]  David R. Cheriton,et al.  Logged virtual memory , 1995, SOSP.

[25]  Mahadev Satyanarayanan,et al.  Disconnected operation in the Coda File System , 1992, TOCS.

[26]  Frank B. Schmuck,et al.  Experience with transactions in QuickSilver , 1991, SOSP '91.

[27]  Jeffrey C. Mogul,et al.  Spritely NFS: experiments with cache-consistency protocols , 1989, SOSP '89.

[28]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[29]  Michael N. Nelson,et al.  Caching in the Sprite network file system , 1988, TOCS.

[30]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[31]  Gerald J. Popek,et al.  Transactions and Synchronization in a Distributed Operating System , 1985, SOSP.

[32]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP '85.

[33]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[34]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[35]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.