How good is random linear coding based distributed networked storage

We consider the problem of storing a large file or multiple large files in a distributed manner over a network. In the framework we consider, there are multiple storage locations, each of which only have very limited storage space for each file. Each storage location chooses a part (or a coded version of the parts) of the file without the knowledge of what is stored in the other locations. We want a file-downloader to connect to as few storage locations as possible and retrieve the entire file. We compare the performance of three strategies: uncoded storage, traditional erasure coding based storage, random linear coding based storage motivated by network coding. We demonstrate that, in principle, a traditional erasure coding based storage (eg: Reed-Solomon Codes) strategy can almost do as well as one can ask for with appropriate choice of parameters. However, the cost is a large amount of additional storage space required at the centralized server before distribution among multiple locations. The random linear coding based strategy performs as well without suffering from any such disadvantage. Further, with a probability close to one, the minimum number of storage location a downloader needs to connect to (for reconstructing the entire file), can be very close to the case where there is complete coordination between the storage locations and the downloader. We also argue that an uncoded strategy performs poorly.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[2]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[3]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[4]  R. Yeung,et al.  Secure network coding , 2002, Proceedings IEEE International Symposium on Information Theory,.

[5]  Muriel Médard,et al.  An algebraic approach to network coding , 2003, TNET.

[6]  Muriel Medard,et al.  On Randomized Network Coding , 2003 .

[7]  R. Koetter,et al.  The benefits of coding over routing in a randomized setting , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[8]  Tracey Ho,et al.  Byzantine modification detection in multicast networks using randomized network coding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[9]  Christos Gkantsidis,et al.  Network coding for large scale content distribution , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[10]  Muriel Médard,et al.  Algebraic gossip: a network coding approach to optimal multiple rumor mongering , 2006, IEEE Transactions on Information Theory.