The design and implementation of the Warp Transactional Filesystem

This paper introduces the Warp Transactional Filesystem (WTF), a novel, transactional, POSIX-compatible filesystem based on a new file slicing API that enables efficient zero-copy file transformations. WTF provides transactional access spanning multiple files in a distributed filesystem. Further, the file slicing API enables applications to construct files from the contents of other files without having to rewrite or relocate data. Combined, these enable a new class of high-performance applications. Experiments show that WTF can qualitatively outperform the industry-standard HDFS distributed filesystem, up to a factor of four in a sorting benchmark, by reducing I/O costs. Microbenchmarks indicate that the new features of WTF impose only a modest overhead on top of the POSIX-compatible API.

[1]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[2]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..

[3]  Frank B. Schmuck,et al.  Experience with transactions in QuickSilver , 1991, SOSP '91.

[4]  Michael A. Olson,et al.  The Design and Implementation of the Inversion File System , 1993, USENIX Winter.

[5]  John Rosenberg,et al.  Distributed persistent stores , 1993, Microprocess. Microsystems.

[6]  Margo I. Seltzer Transaction support in a log-structured file system , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[7]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[8]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[9]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[10]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[11]  Paulo Guedes,et al.  The PerDiS FS: a transactional file system for a distributed persistent store , 1998, ACM SIGOPS European Workshop.

[12]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[13]  Hai Jin,et al.  The Zebra Striped Network File System , 2002 .

[14]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[15]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .

[17]  Rodrigo Rodrigues,et al.  Transactional file systems can be fast , 2004, EW 11.

[18]  Eric A. Brewer,et al.  Stasis: flexible transactional storage , 2006, OSDI '06.

[19]  H. Apte,et al.  Serverless Network File Systems , 2006 .

[20]  Erez Zadok,et al.  Extending ACID semantics to the file system , 2007, TOS.

[21]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[22]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[23]  Erez Zadok,et al.  Enabling Transactional File Access via Lightweight Kernel Extensions , 2009, FAST.

[24]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[25]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[26]  Sean Quinlan,et al.  GFS: evolution on fast-forward , 2010, Commun. ACM.

[27]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[28]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[29]  Daniel J. Abadi,et al.  Calvin: fast distributed transactions for partitioned database systems , 2012, SIGMOD Conference.

[30]  Jon Howell,et al.  Flat Datacenter Storage , 2012, OSDI.

[31]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[32]  Yang Wang,et al.  Robustness in the Salus Scalable Block Store , 2013, NSDI.

[33]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[34]  Asim Kadav,et al.  Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications , 2014, NSDI.

[35]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[36]  Antony I. T. Rowstron,et al.  Pelican: A Building Block for Exascale Cold Data Storage , 2014, OSDI.

[37]  Emin Gün Sirer,et al.  Warp: Lightweight Multi-Key Transactions for Key-Value Stores , 2015, ArXiv.

[38]  Daniel J. Abadi,et al.  CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems , 2015, FAST.