SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS

In this paper, we propose SOR-HDFS, a SEDA (Staged Event-Driven Architecture)-based approach to improve the performance of HDFS Write operation. This design not only incorporates RDMA-based communication over InfiniBand but also maximizes overlapping among different stages of data transfer and I/O. Performance evaluations show that, the new design improves the aggregated write throughput of Enhanced DFSIO benchmark in Intel HiBench by up to 64% and reduces the job execution time by 37% compared to IPoIB (IP over InfiniBand). Compared to the previous best RDMA-enhanced design [4], the improvements in throughput and execution time are 30% and 20%, respectively. Our design can also improve the performance of HBase Put operation by up to 53% over IPoIB and 29% compared to the previous best RDMA-enhanced HDFS. To the best of our knowledge, this is the first design of SEDA-based HDFS in the literature.

[1]  Alan L. Cox,et al.  The Hadoop distributed filesystem: Balancing portability and performance , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[2]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[3]  Dhabaleswar K. Panda,et al.  High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Dhabaleswar K. Panda,et al.  A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters , 2012, WBDB.

[5]  Dhabaleswar K. Panda,et al.  Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? , 2013, 2013 IEEE 21st Annual Symposium on High-Performance Interconnects.