On the Synchronization Bottleneck of OpenStack Swift-Like Cloud Storage Systems

As one type of the most popular cloud storage services, OpenStack Swift and its follow-up systems replicate each object across multiple storage nodes and leverage <italic>object sync protocols</italic> to achieve high reliability and <italic>eventual consistency</italic>. The performance of object sync protocols heavily relies on two key parameters: <inline-formula><tex-math notation="LaTeX">$r$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq1-2810179.gif"/></alternatives></inline-formula> (number of replicas for each object) and <inline-formula><tex-math notation="LaTeX">$n$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq2-2810179.gif"/></alternatives></inline-formula> (number of objects hosted by each storage node). In existing tutorials and demos, the configurations are usually <inline-formula> <tex-math notation="LaTeX">$r=3$</tex-math><alternatives><inline-graphic xlink:href="ruan-ieq3-2810179.gif"/> </alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$n<1,000$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq4-2810179.gif"/></alternatives></inline-formula> by default, and the sync process seems to perform well. However, we discover in data-intensive scenarios, e.g., when <inline-formula> <tex-math notation="LaTeX">$r>3$</tex-math><alternatives><inline-graphic xlink:href="ruan-ieq5-2810179.gif"/> </alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$n\gg 1,000$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq6-2810179.gif"/></alternatives></inline-formula>, the sync process is significantly delayed and produces massive network overhead, referred to as the <italic>sync bottleneck problem</italic>. By reviewing the source code of OpenStack Swift, we find that its object sync protocol utilizes a fairly simple and network-intensive approach to check the consistency among replicas of objects. Hence in a sync round, the number of exchanged hash values per node is <inline-formula><tex-math notation="LaTeX">$\Theta (n\times r)$</tex-math> <alternatives><inline-graphic xlink:href="ruan-ieq7-2810179.gif"/></alternatives></inline-formula>. To tackle the problem, we propose a lightweight and practical object sync protocol, <italic>LightSync</italic>, which not only remarkably reduces the sync overhead, but also preserves high reliability and eventual consistency. LightSync derives this capability from three novel building blocks: 1) <italic>Hashing of Hashes</italic>, which aggregates all the <inline-formula><tex-math notation="LaTeX">$h$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq8-2810179.gif"/></alternatives></inline-formula> hash values of each data partition into a single but representative hash value with the Merkle tree; 2) <italic>Circular Hash Checking</italic>, which checks the consistency of different partition replicas by only sending the aggregated hash value to the clockwise neighbor; and 3) <italic>Failed Neighbor Handling</italic>, which properly detects and handles node failures with moderate overhead to effectively strengthen the robustness of LightSync. The design of LightSync offers provable guarantee on reducing the per-node network overhead from <inline-formula><tex-math notation="LaTeX">$\Theta (n\times r)$</tex-math><alternatives><inline-graphic xlink:href="ruan-ieq9-2810179.gif"/></alternatives></inline-formula> to <inline-formula><tex-math notation="LaTeX">$\Theta (\frac{n}{h})$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq10-2810179.gif"/></alternatives></inline-formula>. Furthermore, we have implemented LightSync as an open-source patch and adopted it to OpenStack Swift, thus reducing the sync delay by up to 879 <inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq11-2810179.gif"/></alternatives></inline-formula> and the network overhead by up to 47.5<inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq12-2810179.gif"/></alternatives></inline-formula>.

[1]  Rachid Guerraoui,et al.  A High Throughput Atomic Storage Algorithm , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[2]  Karl Aberer,et al.  A self-organized, fault-tolerant and scalable replication scheme for cloud storage , 2010, SoCC '10.

[3]  A. Fleischmann Distributed Systems , 1994, Springer Berlin Heidelberg.

[4]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[5]  Minming Li,et al.  TailCutter: Wisely cutting tail latency in cloud CDN under cost constraints , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[6]  Ali Raza Butt,et al.  An in-memory object caching framework with adaptive load balancing , 2015, EuroSys.

[7]  Sape J. Mullender,et al.  Distributed systems (2nd Ed.) , 1993 .

[8]  Zhi-Li Zhang,et al.  Coarse-grained cloud synchronization mechanism design may lead to severe traffic overuse , 2013 .

[9]  Yafei Dai,et al.  CHARM: A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability , 2015, IEEE Transactions on Cloud Computing.

[10]  Hai Jin,et al.  On the performance of cloud storage applications with global measurement , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[11]  Antonio Corradi,et al.  VM consolidation: A real case based on OpenStack Cloud , 2014, Future Gener. Comput. Syst..

[12]  Ming Zhong,et al.  Replication degree customization for high availability , 2008, Eurosys '08.

[13]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[14]  Zhenhua Li,et al.  CoCloud: Enabling efficient cross-cloud file collaboration based on inefficient web APIs , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[15]  Srinath T. V. Setty,et al.  Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[16]  Robert Tappan Morris,et al.  Flexible, Wide-Area Storage for Distributed Systems with WheelFS , 2009, NSDI.

[17]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[18]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[19]  Kui Ren,et al.  On the synchronization bottleneck of OpenStack Swift-like cloud storage systems , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[20]  David Wolinsky,et al.  An untold story of redundant clouds: making your service deployment truly reliable , 2013, HotDep.

[21]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[22]  Ben Y. Zhao,et al.  Efficient Batched Synchronization in Dropbox-Like Cloud Storage Services , 2013, Middleware.

[23]  Gaogang Xie,et al.  An Empirical Analysis of a Large-scale Mobile Cloud Storage Service , 2016, Internet Measurement Conference.

[24]  Zh Li,et al.  Peer-to-Peer network: Structure, application and design , 2007 .

[25]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[26]  Yunhao Liu,et al.  Towards Network-level Efficiency for Cloud Storage Services , 2014, Internet Measurement Conference.

[27]  Ralph C. Merkle,et al.  Protocols for Public Key Cryptosystems , 1980, 1980 IEEE Symposium on Security and Privacy.

[28]  Aiko Pras,et al.  Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[29]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[30]  Yang Li,et al.  Towards Web-based Delta Synchronization for Cloud Storage Services , 2018, FAST.

[31]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[32]  Aiko Pras,et al.  Benchmarking personal cloud storage , 2013, Internet Measurement Conference.

[33]  Petr Kuznetsov,et al.  Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.

[34]  Yafei Dai,et al.  DeltaCFS: Boosting Delta Sync for Cloud Storage Services by Learning from NFS , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[35]  David Wolinsky,et al.  Heading Off Correlated Failures through Independence-as-a-Service , 2014, OSDI.