Download Time Analysis for Distributed Storage Codes With Locality and Availability

Availability codes have recently been proposed to facilitate efficient retrieval of frequently accessed (hot) data objects in distributed storage systems. This paper presents techniques for analyzing the download time of systematic availability codes considering the Fork-Join scheme for data access. Specifically, we consider the setup in which requests arrive for downloading individual data objects, and each request is replicated (forked) to the systematic server containing the object and all of its recovery groups. For low-traffic regime, when there is at most one request in the system, we compute the download time in closed-form and compare it across systems with availability, maximum distance separable (MDS), and replication codes. We demonstrate that availability codes can reduce download time in some settings, but are not always optimal. When the low-traffic assumption does not hold, system consists of multiple inter-dependent Fork-Join queues, which makes exact analysis intractable due to state space explosion. Here, we present upper and lower bounds on the download time, and an M/G/1 queue approximation for several special cases of interest. Via extensive numerical simulations, we evaluate our bounds, and demonstrate that the M/G/1 queue approximation has a high degree of accuracy.

[1]  Kannan Ramchandran,et al.  Codes can reduce queueing delay in data centers , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[2]  R. Srikant,et al.  Mean-field-analysis of coding versus replication in cloud storage systems , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[3]  Emina Soljanin,et al.  Queues with Redundancy: Latency-Cost Analysis , 2015, PERV.

[4]  Emina Soljanin,et al.  Efficient Redundancy Techniques for Latency Reduction in Cloud Systems , 2015, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[5]  L. Flatto,et al.  Two parallel queues created by arrivals with two demands. II , 1984 .

[6]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[7]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[8]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[9]  Casey A. Volino,et al.  A First Course in Stochastic Models , 2005, Technometrics.

[10]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[11]  Frédérique E. Oggier,et al.  Locally repairable codes with multiple repair alternatives , 2013, 2013 IEEE International Symposium on Information Theory.

[12]  Frédérique Oggier,et al.  Self-repairing homomorphic codes for distributed storage systems , 2010, 2011 Proceedings IEEE INFOCOM.

[13]  Emina Soljanin,et al.  Simplex Queues for Hot-Data Download , 2017, SIGMETRICS.

[14]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[15]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[16]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  A. Robert Calderbank,et al.  Rate optimal binary linear locally repairable codes with small availability , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[19]  Nihar B. Shah,et al.  When do redundant requests reduce latency ? , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Arya Mazumdar,et al.  Bounds on the Size of Locally Recoverable Codes , 2015, IEEE Transactions on Information Theory.

[21]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[22]  Ronald W. Wolff,et al.  Poisson Arrivals See Time Averages , 1982, Oper. Res..

[23]  Emina Soljanin,et al.  Coding for fast content download , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[25]  Emina Soljanin,et al.  Efficient replication of queued tasks for latency reduction in cloud systems , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  ̧ MehmetFaṫihAktas Performance evaluation of redundancy techniques for distributed storage and computing systems , 2020 .

[27]  Ulas C. Kozat,et al.  FAST CLOUD: Pushing the Envelope on Delay Performance of Cloud Storage With Coding , 2013, IEEE/ACM Transactions on Networking.

[28]  L. Flatto,et al.  Erratum: Two Parallel Queues Created by Arrivals with Two Demands I , 1985 .

[29]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[30]  Itzhak Tamo,et al.  Bounds on locally recoverable codes with multiple recovering sets , 2014, 2014 IEEE International Symposium on Information Theory.

[31]  Dimitris S. Papailiopoulos,et al.  Locality and Availability in Distributed Storage , 2014, IEEE Transactions on Information Theory.

[32]  Emina Soljanin,et al.  Heuristics for Analyzing Download Time in MDS Coded Storage Systems , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[33]  Zhifang Zhang,et al.  Repair Locality With Multiple Erasure Tolerance , 2014, IEEE Transactions on Information Theory.

[34]  Emina Soljanin,et al.  On the Delay-Storage Trade-Off in Content Download from Coded Distributed Storage Systems , 2013, IEEE Journal on Selected Areas in Communications.

[35]  Parimal Parag,et al.  Latency analysis for distributed storage , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[36]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[37]  Cheng Huang,et al.  On the Locality of Codeword Symbols , 2011, IEEE Transactions on Information Theory.

[38]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[39]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[40]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[41]  Emina Soljanin,et al.  Analyzing the download time of availability codes , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[42]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.