Fault Tolerance Performance Evaluation of Large-Scale Distributed Storage Systems HDFS and Ceph Case Study

Large-scale distributed systems are a collection of loosely coupled computers interconnected by a communication network. They are now an integral part of everyday life with the development of large web applications, social networks, peer-to-peer systems, wireless sensor networks and many more. At such a scale, hardware components by themselves are prone to failure. Therefore, one key challenge in designing distributed storage systems is how to tolerate faults. To this end, fault tolerance mechanisms such as replication have been widely used to provide high availability for decades. More recently, many systems start supporting erasure coding for fault tolerance, which is expected to achieve high reliability at a lower storage cost compared to replication. However, the reduced storage overhead comes at the cost of more complicated recovery which hurts performance. In this paper, we study the fault tolerance mechanisms of two representative distributed file systems: HDFS and Ceph. In addition to the traditional replication, both HDFS and Ceph support erasure coding in their latest version. We evaluate the replication and erasure coding implementations in both systems using standard benchmarks and fault injection, and quantitatively measure the performance and storage overhead. Our results demonstrate the trade-offs between replication and erasure coding techniques, and serve as a foundation for building optimal storage systems with high availability as well as high performance.

[1]  Pallavi Joshi,et al.  SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems , 2014, OSDI.

[2]  James S. Plank,et al.  Erasure Codes for Storage Systems: A Brief Primer , 2013, login Usenix Mag..

[3]  Robert Mateescu,et al.  Opening the Chrysalis: On the Real Repair Performance of MSR Codes , 2016, FAST.

[4]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[5]  Mark Lillibridge,et al.  Torturing Databases for Fun and Profit , 2014, OSDI.

[6]  Mai Zheng,et al.  Understanding the Fault Resilience of File System Checkers , 2017, HotStorage.

[7]  Yong Chen,et al.  PFault: A General Framework for Analyzing the Reliability of High-Performance Parallel File Systems , 2018, ICS.

[8]  Yong Chen,et al.  A Generic Framework for Testing Parallel File Systems , 2016, 2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS).

[9]  Robert Mateescu,et al.  Towards Robust File System Checkers , 2018, FAST.

[10]  Mark Lillibridge,et al.  Understanding the robustness of SSDS under power fault , 2013, FAST.

[11]  Changwoo Min,et al.  Cross-checking semantic correctness: the case of finding file system bugs , 2015, SOSP.

[12]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Kannan Ramchandran,et al.  A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers , 2015, SIGCOMM 2015.

[14]  Eric Eide,et al.  Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..

[15]  Junfeng Yang,et al.  EXPLODE: a lightweight, general system for finding serious storage system errors , 2006, OSDI '06.

[16]  Fred Douglis,et al.  RAIDShield: Characterizing, Monitoring, and Proactively Protecting Against Disk Failures , 2015, FAST.

[17]  GhemawatSanjay,et al.  The Google file system , 2003 .

[18]  Mark Lillibridge,et al.  Reliability Analysis of SSDs Under Power Fault , 2016, ACM Trans. Comput. Syst..

[19]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[20]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[21]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .