DeltaCFS: Boosting Delta Sync for Cloud Storage Services by Learning from NFS

Cloud storage services, such as Dropbox, iCloud Drive, Google Drive, and Microsoft OneDrive, have greatly facilitated users’ synchronizing files across heterogeneous devices. Among them, Dropbox-like services are particularly beneficial owing to the delta sync functionality that strives towards greater network-level efficiency. However, when delta sync trades computation overhead for network-traffic saving, the tradeoff could be highly unfavorable under some typical workloads. We refer to this problem as the abuse of delta sync. To address this problem, we propose DeltaCFS, a novel file sync framework for cloud storage services by learning from the design of conventional NFS (Network File System). Specifically, we combine delta sync with NFS-like file RPC in an adaptive manner, thus significantly cutting computation overhead on both the client and server sides while preserving the network-level efficiency. DeltaCFS also enables a neat design for guaranteeing causal consistency and fine-grained version control of files. In our FUSE-based prototype system (which is open-source), DeltaCFS outperforms Dropbox by generating up to 11x less data transfer and up to 100x less computation overhead under concerned workloads.

[1]  Aiko Pras,et al.  Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[2]  Yunhao Liu,et al.  T-CloudDisk: a tunable cloud storage service for flexible batched synchronization , 2013, MiddlewareDPT '13.

[3]  Andrea C. Arpaci-Dusseau,et al.  Beyond Storage APIs: Provable Semantics for Storage Stacks , 2015, HotOS.

[4]  Peter A. Dinda,et al.  Wayback: A User-level Versioning File System for Linux (Awarded Best Paper!) , 2004, USENIX Annual Technical Conference, FREENIX Track.

[5]  Michael Vrable,et al.  Cumulus: Filesystem backup to the cloud , 2009, TOS.

[6]  Andrea C. Arpaci-Dusseau,et al.  Analysis and Evolution of Journaling File Systems , 2005, USENIX Annual Technical Conference, General Track.

[7]  G.G. Langdon,et al.  Data compression , 1988, IEEE Potentials.

[8]  John D. Valois Implementing Lock-Free Queues , 1994 .

[9]  Fred Douglis,et al.  USENIX Association Proceedings of the General Track : 2003 USENIX Annual , 2003 .

[10]  Mingqiang Li,et al.  CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal , 2015, IEEE Internet Computing.

[11]  Torsten Suel,et al.  Algorithms for Delta Compression and Remote File Synchronization , 2003 .

[12]  Yunhao Liu,et al.  Towards Network-level Efficiency for Cloud Storage Services , 2014, Internet Measurement Conference.

[13]  Mark Lillibridge,et al.  Extreme Binning: Scalable, parallel deduplication for chunk-based file backup , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[14]  Ben Y. Zhao,et al.  Efficient Batched Synchronization in Dropbox-Like Cloud Storage Services , 2013, Middleware.

[15]  Thomas Haynes,et al.  Network File System (NFS) Version 4 Protocol , 2003, RFC.

[16]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[17]  Jin Li,et al.  Reducing replication bandwidth for distributed document databases , 2015, SoCC.

[18]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[19]  Joshua P. MacDonald,et al.  File System Support for Delta Compression , 2000 .

[20]  Michael Dahlin,et al.  TAPER: tiered approach for eliminating redundancy in replica synchronization , 2005, FAST'05.

[21]  Fred Douglis,et al.  Redundancy Elimination Within Large Collections of Files , 2004, USENIX Annual Technical Conference, General Track.

[22]  Andrea C. Arpaci-Dusseau,et al.  ViewBox: integrating local file systems with cloud storage services , 2014, FAST.

[23]  Ashish Gehani,et al.  Performance and extension of user space file systems , 2010, SAC '10.

[24]  Timothy Bisson,et al.  iDedup: latency-aware, inline data deduplication for primary storage , 2012, FAST.

[25]  Cristian Ungureanu,et al.  Reliable, Consistent, and Efficient Data Sync for Mobile Apps , 2015, FAST.

[26]  Mark Lillibridge,et al.  Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality , 2009, FAST.

[27]  Raju Rangaswami,et al.  Non-blocking Writes to Files , 2015, FAST.

[28]  Darrell D. E. Long,et al.  Deep Store: an archival storage system architecture , 2005, 21st International Conference on Data Engineering (ICDE'05).

[29]  Michael Vrable,et al.  BlueSky: a cloud-backed file system for the enterprise , 2012, FAST.

[30]  Cristian Ungureanu,et al.  Revisiting storage for smartphones , 2012, TOS.

[31]  Xin Wang,et al.  QuickSync: Improving Synchronization Efficiency for Mobile Cloud Storage Services , 2017, IEEE Transactions on Mobile Computing.

[32]  Andrea C. Arpaci-Dusseau,et al.  A File Is Not a File: Understanding the I/O Behavior of Apple Desktop Applications , 2012, TOCS.

[33]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[34]  Yafei Dai,et al.  Understanding and Surpassing Dropbox: Efficient Incremental Synchronization in Cloud Storage Services , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[35]  Srinath T. V. Setty,et al.  Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[36]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[37]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[38]  David Mazières,et al.  Replication, history, and grafting in the Ori file system , 2013, SOSP.

[39]  Ranveer Chandra,et al.  On the energy overhead of mobile storage systems , 2014, FAST.

[40]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[41]  Kai Li,et al.  Avoiding the Disk Bottleneck in the Data Domain Deduplication File System , 2008, FAST.

[42]  Aiko Pras,et al.  Benchmarking personal cloud storage , 2013, Internet Measurement Conference.

[43]  Magnus Karlsson,et al.  Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.