On the Delay-Storage Trade-Off in Content Download from Coded Distributed Storage Systems

We study how coding in distributed storage reduces expected download time, in addition to providing reliability against disk failures. The expected download time is reduced because when a content file is encoded with redundancy and distributed across multiple disks, reading only a subset of the disks is sufficient for content reconstruction. For the same total storage used, coding exploits the diversity in storage better than simple replication, and hence gives faster download. We use a novel fork-join queueing framework to model multiple users requesting the content simultaneously, and derive bounds on the expected download time. Our system model and results are a novel generalization of the fork-join system that is studied in queueing theory literature. Our results demonstrate the fundamental trade-off between the expected download time and the amount of storage space. This trade-off can be used for design of the amount of redundancy required to meet the delay constraints on content delivery.

[1]  P. Konstantopoulos,et al.  Stationary and stability of fork-join networks , 1989, Journal of Applied Probability.

[2]  Yolande Berbers,et al.  Power-reduction techniques for data-center storage systems , 2013, CSUR.

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  N. Papadatos Maximum variance of order statistics , 1995 .

[5]  Kannan Ramchandran,et al.  Explicit construction of optimal exact regenerating codes for distributed storage , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  Kannan Ramchandran,et al.  The MDS Queue: Analysing Latency Performance of Codes and Redundant Requests , 2012 .

[7]  Jehoshua Bruck,et al.  Highly Available Distributed Storage Systems , 1998, Wide Area Networks and High Performance Computing.

[8]  Sriram Vishwanath,et al.  Update-efficient codes for erasure correction , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  L. Flatto,et al.  Two parallel queues created by arrivals with two demands. II , 1984 .

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Sriram Vishwanath,et al.  Update efficient codes for distributed storage , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[12]  Randolph D. Nelson,et al.  An approximation to the response time for shortest queue routing , 1989, SIGMETRICS '89.

[13]  Jehoshua Bruck,et al.  Zigzag Codes: MDS Array Codes With Optimal Rebuilding , 2011, IEEE Transactions on Information Theory.

[14]  Muriel Médard,et al.  Toward sustainable networking: Storage area networks with network coding , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Casey A. Volino,et al.  A First Course in Stochastic Models , 2005, Technometrics.

[16]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[17]  Jing Yang,et al.  Exploiting route diversity in multi-packet transmission using mutual information accumulation , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  E. Krouk,et al.  Error Correcting Coding and Security for Data Networks: Analysis of the Superchannel Concept , 2007 .

[19]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[20]  Ashok K. Agrawala,et al.  Analysis of the Fork-Join Queue , 1989, IEEE Trans. Computers.

[21]  Kannan Ramchandran,et al.  Codes can reduce queueing delay in data centers , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[22]  Henk C. Tijms,et al.  A First Course in Stochastic Models: Tijms/Stochastic Models , 2003 .

[23]  L. Flatto,et al.  Erratum: Two Parallel Queues Created by Arrivals with Two Demands I , 1985 .

[24]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.

[25]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[26]  Angelika Bayer,et al.  A First Course In Probability , 2016 .

[27]  Kannan Ramchandran,et al.  The MDS queue: Analysing the latency performance of erasure codes , 2012, 2014 IEEE International Symposium on Information Theory.

[28]  Emina Soljanin Reducing delay with coding in (mobile) multi-agent information transfer , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[30]  B. Arnold,et al.  Bounds on Expectations of Linear Systematic Statistics Based on Dependent Samples , 1979 .

[31]  D. Walkup,et al.  Association of Random Variables, with Applications , 1967 .

[32]  Brighten Godfrey,et al.  More is less: reducing latency via redundancy , 2012, HotNets-XI.

[33]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[34]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[35]  Paul E. Wright,et al.  Two parallel processors with coupled inputs , 1992, Advances in Applied Probability.

[36]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[37]  Emina Soljanin,et al.  Coding for fast content download , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[38]  Nicholas F. Maxemchuk,et al.  Dispersity Routing in High-Speed Networks , 1993, Comput. Networks ISDN Syst..

[39]  Kannan Ramchandran,et al.  Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions , 2010, IEEE Transactions on Information Theory.