Large-scale distributed systems are a collection of loosely coupled computers interconnected by a communication network. They are now an integral part of everyday life with the development of large web applications, social networks, peer-to-peer systems, wireless sensor networks and many more. Because each disk by itself is prone to failure, one key challenge in designing such systems is their ability to tolerate faults. Hence, fault tolerance mechanisms such as replication are widely used to provide data availability at all times. On the other hand, many systems now are increasingly supporting new mechanism called erasure coding (EC), claiming that using EC provides high reliability at lower storage cost than replication. However, this comes at the cost of performance. Our goal in this paper is to compare the performance and storage requirements of these two data reliability techniques for two open source systems: HDFS and Ceph especially that the Apache Software Foundation had released a new version of Hadoop, Apache Hadoop 3.0.0, which now supports EC. In addition, with the Firefly release (May 2014) Ceph added support for EC as well. We tested replication vs. EC in both systems using several benchmarks shipped with these systems. Results show that there are trade-offs between replication and EC in terms of performance and storage requirements.
[1]
John Kubiatowicz,et al.
Erasure Coding Vs. Replication: A Quantitative Comparison
,
2002,
IPTPS.
[2]
Eric Eide,et al.
Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications
,
2014,
login Usenix Mag..
[3]
James S. Plank,et al.
Erasure Codes for Storage Systems: A Brief Primer
,
2013,
login Usenix Mag..
[4]
Kannan Ramchandran,et al.
A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers
,
2014
.
[5]
Timothy Roscoe,et al.
Arrakis
,
2014,
OSDI.
[6]
Hairong Kuang,et al.
The Hadoop Distributed File System
,
2010,
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[7]
Carlos Maltzahn,et al.
Ceph: a scalable, high-performance distributed file system
,
2006,
OSDI '06.