Performance Impacts with Reliable Parallel File Systems at Exascale Level

The introduction of Exascale storage into production systems will lead to an increase on the number of storage servers needed by parallel file systems. In this scenario, parallel file system designers should move from the current replication configurations to the more space and energy efficient erasure-coded configurations between storage servers. Unfortunately, the current trends on energy efficiency are directed to creating less powerful clients, but a larger number of them (light-weight Exascale nodes), increasing the frequency of write requests and therefore creating more parity update requests. In this paper, we investigate RAID-5 and RAID-6 parity-based reliability organizations in Exascale storage systems. We propose two software mechanisms to improve the performance of write requests. The first mechanism reduces the number of operations to update a parity block, improving the performance of writes up to 200 %. The second mechanism allows applications to notify when reliability is needed by the data, delaying the parity calculation and improving the performance up to a 300 %. Using our proposals, traditional replication schemes can be replaced by reliability models like RAID-5 or RAID-6 without the expected performance loss.

[1]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[2]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[3]  Hong Jiang,et al.  P-Code: a new RAID-6 code with optimal properties , 2009, ICS '09.

[4]  A. Varga,et al.  THE OMNET++ DISCRETE EVENT SIMULATION SYSTEM , 2003 .

[5]  Shivakumar Venkataraman,et al.  The TickerTAIP parallel RAID architecture , 1993, ISCA '93.

[6]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[7]  Erez Zadok,et al.  A Versatile and User-Oriented Versioning File System , 2004, FAST.

[8]  Anthony Skjellum,et al.  Accelerating Reed-Solomon coding in RAID systems with GPUs , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[9]  Deok-Hwan Kim,et al.  Reliability and performance enhancement technique for SSD array storage system using RAID mechanism , 2009, 2009 9th International Symposium on Communications and Information Technology.

[10]  Hai Jin,et al.  RAID-x: a new distributed disk array for I/O-centric cluster computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[11]  William J. Dally,et al.  Scaling the Power Wall: A Path to Exascale , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Paul M. Carpenter,et al.  EUROSERVER: Energy Efficient Node for European Micro-Servers , 2014, 2014 17th Euromicro Conference on Digital System Design.

[13]  Chentao Wu,et al.  HDP code: A Horizontal-Diagonal Parity Code to Optimize I/O load balancing in RAID-6 , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[14]  James Irvine,et al.  RAID 6 Hardware Acceleration , 2006, FPL.

[15]  Mateo Valero,et al.  Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[17]  Hai Jin,et al.  Parity Logging Overcoming the Small Write Problem in Redundant Disk Arrays , 2002 .

[18]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[19]  Leonid Oliker,et al.  Performance Characteristics of a Cosmology Package on Leading HPC Architectures , 2004, HiPC.

[20]  Wei-keng Liao,et al.  A case study for scientific I/O: improving the FLASH astrophysics code , 2012 .

[21]  Alex Ramírez,et al.  The low-power architecture approach towards exascale computing , 2011, ScalA '11.

[22]  Chentao Wu,et al.  H-Code: A Hybrid MDS Array Code to Optimize Partial Stripe Writes in RAID-6 , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[23]  André Brinkmann,et al.  A microdriver architecture for error correcting codes inside the Linux kernel , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[24]  Stefan Savage,et al.  AFRAID - A Frequently Redundant Array of Independent Disks , 1996, USENIX Annual Technical Conference.