Sparsity exploiting erasure coding for distributed storage of versioned data

In this paper we study the problem of storing reliably an archive of versioned data. Specifically, we focus on systems where the differences (deltas) between subsequent versions rather than the whole objects are stored—a typical model for storing versioned data. For reliability, we propose erasure encoding techniques that exploit the sparsity of information in the deltas while storing them reliably in a distributed back-end storage system, resulting in improved I/O read performance to retrieve the whole versioned archive. Along with the basic techniques, we propose a few optimization heuristics, and evaluate the techniques’ efficacy analytically and with numerical simulations.

[1]  Erez Zadok,et al.  Generating Realistic Datasets for Deduplication Analysis , 2012, USENIX Annual Technical Conference.

[2]  Sriram Vishwanath,et al.  Update efficient codes for distributed storage , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[3]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[4]  Frédérique E. Oggier,et al.  Sparsity Exploiting Erasure Coding for Resilient Storage and Efficient I/O Access in Delta Based Versioning Systems , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[5]  Zhiying Wang,et al.  On multi-version coding for distributed storage , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  Kyumars Sheykh Esmaili,et al.  Efficient updates in cross-object erasure-coded storage systems , 2013, 2013 IEEE International Conference on Big Data.

[7]  Jérôme Lacan,et al.  A Construction of Matrices with No Singular Square Submatrices , 2003, International Conference on Finite Fields and Applications.

[8]  Yunghsiang Sam Han,et al.  Update-efficient regenerating codes with minimum per-node storage , 2013, 2013 IEEE International Symposium on Information Theory.

[9]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[10]  Gregory W. Wornell,et al.  Update efficient codes for error correction , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[11]  Fan Zhang,et al.  Compressed sensing and linear codes over real numbers , 2008, 2008 Information Theory and Applications Workshop.

[12]  Frédérique E. Oggier,et al.  Coding Techniques for Repairability in Networked Distributed Storage Systems , 2013, Found. Trends Commun. Inf. Theory.

[13]  Frédérique E. Oggier,et al.  An overview of codes tailor-made for better repairability in networked distributed storage systems , 2013, SIGA.

[14]  Han Mao Kiah,et al.  Synchronizing edits in distributed storage networks , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).