Speculative execution in a distributed file system

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through inter-process communication. It guarantees correct execution by preventing speculative processes from externalizing output, e.g., sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local-area networks and an order of magnitude improvement over wide-area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.

[1]  Brian Beckman,et al.  Time warp operating system , 1987, SOSP '87.

[2]  Frank B. Schmuck,et al.  Experience with transactions in QuickSilver , 1991, SOSP '91.

[3]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[4]  Rodrigo Rodrigues,et al.  Transactional file systems can be fast , 2004, EW 11.

[5]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[7]  Josep Torrellas,et al.  Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[8]  David R. Cheriton,et al.  Logged virtual memory , 1995, SOSP.

[9]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[10]  Gerald J. Popek,et al.  Transactions and Synchronization in a Distributed Operating System , 1985, SOSP.

[11]  Dennis Shasha,et al.  Secure Untrusted Data Repository (SUNDR) , 2004, OSDI.

[12]  P. Couvares Caching in the Sprite network file system , 2006 .

[13]  Anoop Gupta,et al.  The impact of architectural trends on operating system performance , 1995, SOSP.

[14]  Ozalp Babaoglu,et al.  ACM Transactions on Computer Systems , 2007 .

[15]  Cristian Tapus,et al.  Kernel level speculative DSM , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[16]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1987, SOSP '87.

[17]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[18]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[19]  Garret Swart,et al.  The Echo Distributed File System , 1996 .

[20]  Brent Callaghan,et al.  NFS Version 3 Protocol Specification , 1995, RFC.

[21]  David Mazières,et al.  Separating key management from file system security , 1999, SOSP.

[22]  M ChenPeter,et al.  Speculative execution in a distributed file system , 2005 .

[23]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[24]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[25]  Fay W. Chang,et al.  Operating System I/O Speculation: How Two Invocations Are Faster Than One , 2003, USENIX Annual Technical Conference, General Track.

[26]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[27]  Jeffrey C. Mogul,et al.  Spritely NFS: experiments with cache-consistency protocols , 1989, SOSP '89.

[28]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[29]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP '85.

[30]  Samuel T. King,et al.  Backtracking intrusions , 2003, SOSP '03.

[31]  Tzi-cker Chiueh,et al.  Design, implementation, and evaluation of repairable file service , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[32]  Jason Flinn,et al.  Energy-Efficiency and Storage Flexibility in the Blue File System , 2004, OSDI.

[33]  Tong Li,et al.  Pulse: A Dynamic Deadlock Detection Mechanism Using Speculative Execution , 2005, USENIX Annual Technical Conference, General Track.

[34]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[35]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[36]  Miguel Castro,et al.  Proactive recovery in a Byzantine-fault-tolerant system , 2000, OSDI.

[37]  Mahadev Satyanarayanan,et al.  Fundamental challenges in mobile computing , 1996, PODC '96.