ECCH: Erasure Coded Consistent Hashing for Distributed Storage Systems

In this paper, we propose ECCH, an Erasure Coded Consistent Hashing scheme to make better data placement in distributed storage systems. It combines the inherent advantages of consistent hashing together with the storage-efficiency of erasure coding technology. Specifically, ECCH divides data block stream of files into groups according to block IDs. In each group, it encoded data blocks with additional parity blocks by erasure coding. All encoded blocks in the same group are stored on different nodes with consistent hashing distribution. For node failure or data loss, ECCH locates required data through the ID of missing block in a same group for fast recovery. To deal with node changes, ECCH introduces a design of multi-version hash rings to manage data layout. It can prevent the impact of data migration on erasure coding, while achieving data balance with little data movement. We have implemented ECCH on the Sheepdog, a distributed object-based storage system. Evaluation results show that ECCH can greatly improve the space utilization of hashing-based storage systems, while achieving efficient fault tolerance.