Split File Model for Big Data in Low Throughput Storage

The demand for low-cost, large-scale storage is increasing. Recently, several low-throughput storage services such as the Pogo plug Cloud have been developed. These services are based on Amazon Glacier. They have low throughput, but low cost and large capacity. Therefore, these services are suitable for backups or archiving big data and can be used instead of offline storage tiers. To utilize such low throughput storage efficiently, we need tools for effective deduplication and resumable transfers, amongst others. We propose a split file model that can represent big data efficiently in low throughput storage. In the split file model, a large file is divided into many small parts, which are stored in a directory. We have developed tool commands to support the use of split files in a transparent way. Using these commands, replicated data is naturally excluded and effective shallow copying is supported. In this paper, we describe the split file model in detail and evaluate an implementation thereof.