论文信息 - On the Synchronization Bottleneck of OpenStack Swift-Like Cloud Storage Systems

On the Synchronization Bottleneck of OpenStack Swift-Like Cloud Storage Systems

As one type of the most popular cloud storage services, OpenStack Swift and its follow-up systems replicate each object across multiple storage nodes and leverage <italic>object sync protocols</italic> to achieve high reliability and <italic>eventual consistency</italic>. The performance of object sync protocols heavily relies on two key parameters: <inline-formula><tex-math notation="LaTeX">$r$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq1-2810179.gif"/></alternatives></inline-formula> (number of replicas for each object) and <inline-formula><tex-math notation="LaTeX">$n$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq2-2810179.gif"/></alternatives></inline-formula> (number of objects hosted by each storage node). In existing tutorials and demos, the configurations are usually <inline-formula> <tex-math notation="LaTeX">$r=3$</tex-math><alternatives><inline-graphic xlink:href="ruan-ieq3-2810179.gif"/> </alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$n<1,000$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq4-2810179.gif"/></alternatives></inline-formula> by default, and the sync process seems to perform well. However, we discover in data-intensive scenarios, e.g., when <inline-formula> <tex-math notation="LaTeX">$r>3$</tex-math><alternatives><inline-graphic xlink:href="ruan-ieq5-2810179.gif"/> </alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$n\gg 1,000$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq6-2810179.gif"/></alternatives></inline-formula>, the sync process is significantly delayed and produces massive network overhead, referred to as the <italic>sync bottleneck problem</italic>. By reviewing the source code of OpenStack Swift, we find that its object sync protocol utilizes a fairly simple and network-intensive approach to check the consistency among replicas of objects. Hence in a sync round, the number of exchanged hash values per node is <inline-formula><tex-math notation="LaTeX">$\Theta (n\times r)$</tex-math> <alternatives><inline-graphic xlink:href="ruan-ieq7-2810179.gif"/></alternatives></inline-formula>. To tackle the problem, we propose a lightweight and practical object sync protocol, <italic>LightSync</italic>, which not only remarkably reduces the sync overhead, but also preserves high reliability and eventual consistency. LightSync derives this capability from three novel building blocks: 1) <italic>Hashing of Hashes</italic>, which aggregates all the <inline-formula><tex-math notation="LaTeX">$h$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq8-2810179.gif"/></alternatives></inline-formula> hash values of each data partition into a single but representative hash value with the Merkle tree; 2) <italic>Circular Hash Checking</italic>, which checks the consistency of different partition replicas by only sending the aggregated hash value to the clockwise neighbor; and 3) <italic>Failed Neighbor Handling</italic>, which properly detects and handles node failures with moderate overhead to effectively strengthen the robustness of LightSync. The design of LightSync offers provable guarantee on reducing the per-node network overhead from <inline-formula><tex-math notation="LaTeX">$\Theta (n\times r)$</tex-math><alternatives><inline-graphic xlink:href="ruan-ieq9-2810179.gif"/></alternatives></inline-formula> to <inline-formula><tex-math notation="LaTeX">$\Theta (\frac{n}{h})$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq10-2810179.gif"/></alternatives></inline-formula>. Furthermore, we have implemented LightSync as an open-source patch and adopted it to OpenStack Swift, thus reducing the sync delay by up to 879 <inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq11-2810179.gif"/></alternatives></inline-formula> and the network overhead by up to 47.5<inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives> <inline-graphic xlink:href="ruan-ieq12-2810179.gif"/></alternatives></inline-formula>.

[1] Rachid Guerraoui,et al. A High Throughput Atomic Storage Algorithm , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[2] Karl Aberer,et al. A self-organized, fault-tolerant and scalable replication scheme for cloud storage , 2010, SoCC '10.

[3] A. Fleischmann. Distributed Systems , 1994, Springer Berlin Heidelberg.

[4] Ju Wang,et al. Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[5] Minming Li,et al. TailCutter: Wisely cutting tail latency in cloud CDN under cost constraints , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[6] Ali Raza Butt,et al. An in-memory object caching framework with adaptive load balancing , 2015, EuroSys.

[7] Sape J. Mullender,et al. Distributed systems (2nd Ed.) , 1993 .

[8] Zhi-Li Zhang,et al. Coarse-grained cloud synchronization mechanism design may lead to severe traffic overuse , 2013 .

[9] Yafei Dai,et al. CHARM: A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability , 2015, IEEE Transactions on Cloud Computing.

[10] Hai Jin,et al. On the performance of cloud storage applications with global measurement , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[11] Antonio Corradi,et al. VM consolidation: A real case based on OpenStack Cloud , 2014, Future Gener. Comput. Syst..

[12] Ming Zhong,et al. Replication degree customization for high availability , 2008, Eurosys '08.

[13] Marvin Theimer,et al. Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[14] Zhenhua Li,et al. CoCloud: Enabling efficient cross-cloud file collaboration based on inefficient web APIs , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[15] Srinath T. V. Setty,et al. Depot: Cloud Storage with Minimal Trust , 2010, TOCS.

[16] Robert Tappan Morris,et al. Flexible, Wide-Area Storage for Distributed Systems with WheelFS , 2009, NSDI.

[17] Van-Anh Truong,et al. Availability in Globally Distributed Storage Systems , 2010, OSDI.

[18] Nancy A. Lynch,et al. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[19] Kui Ren,et al. On the synchronization bottleneck of OpenStack Swift-like cloud storage systems , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[20] David Wolinsky,et al. An untold story of redundant clouds: making your service deployment truly reliable , 2013, HotDep.

[21] Michael J. Freedman,et al. Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[22] Ben Y. Zhao,et al. Efficient Batched Synchronization in Dropbox-Like Cloud Storage Services , 2013, Middleware.

[23] Gaogang Xie,et al. An Empirical Analysis of a Large-scale Mobile Cloud Storage Service , 2016, Internet Measurement Conference.

[24] Zh Li,et al. Peer-to-Peer network: Structure, application and design , 2007 .

[25] David R. Karger,et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[26] Yunhao Liu,et al. Towards Network-level Efficiency for Cloud Storage Services , 2014, Internet Measurement Conference.

[27] Ralph C. Merkle,et al. Protocols for Public Key Cryptosystems , 1980, 1980 IEEE Symposium on Security and Privacy.

[28] Aiko Pras,et al. Inside dropbox: understanding personal cloud storage services , 2012, Internet Measurement Conference.

[29] David K. Gifford,et al. Weighted voting for replicated data , 1979, SOSP '79.

[30] Yang Li,et al. Towards Web-based Delta Synchronization for Cloud Storage Services , 2018, FAST.

[31] Robbert van Renesse,et al. Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[32] Aiko Pras,et al. Benchmarking personal cloud storage , 2013, Internet Measurement Conference.

[33] Petr Kuznetsov,et al. Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.

[34] Yafei Dai,et al. DeltaCFS: Boosting Delta Sync for Cloud Storage Services by Learning from NFS , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[35] David Wolinsky,et al. Heading Off Correlated Failures through Independence-as-a-Service , 2014, OSDI.