On network coded distributed storage: How to repair in a fog of unreliable peers

This paper focuses on distributed fog storage solutions, where a number of unreliable devices organize themselves in Peer-to-Peer (P2P) networks with the purpose to store reliably their data and that of other devices and/or local users and provide lower delay and higher throughput. Cloud storage systems typically rely on expensive infrastructure with centralized control to store, repair and access the data. This approach introduces a large delay for accessing and storing the data driven in part by a high RTT between users and the cloud. These characteristics are at odds with the massive increase of devices and generated data in coming years as well as the requirements of low latency in many applications. We focus on characterizing optimal solutions for maintaining data availability when nodes in the fog continuously leave the network. In contrast with state-of-the-art data repair formulations, which assume that additional nodes will be available, we focus on an unaddressed problem: performing a repair within the pool of surviving nodes because no new/alternative nodes are available. We provide a mathematical characterization under different storage and network use conditions and develop a practical P2P system that achieves the predicted performance within 1 dB in measurement campaigns using commercial devices.