SSDUP: a traffic-aware ssd burst buffer for HPC systems

Many high performance computing (HPC) applications are highly data intensive. Current HPC storage systems still use hard disk drives (HDDs) as their dominant storage devices, which suffer from disk head thrashing when accessing random data. New storage devices such as solid state drives (SSDs), which can handle random data access much more efficiently, have been widely deployed as the buffer to HDDs in many production HPC systems. Burst buffer has also been proposed to manage the SSD buffering of bursty write requests. Although burst buffer can improve I/O performance in many cases, we find that it has some limitations such as requiring large SSD capacity and harmonious overlapping between computation phase and data flushing stage. In this paper, we propose a scheme, called SSDUP (a traffic-aware SSD burst buffer), to improve the burst buffer by addressing the above limitations. In order to reduce the SSD capacity demand, we develop a novel traffic-detection method to detect the randomness in the write traffic. Based on this method, only the random writes are buffered to SSD and other writes are deemed sequential and propagated to HDDs directly. In order to overcome the difficulty of perfectly overlapping the computation phase and the flushing stage, we propose a pipeline mechanism for the SSD buffer, in which the data buffering and data flushing are performed in pipeline. Finally, in order to further improve the performance of buffering random writes in SSD, we covert the random writes to sequential writes in SSD by storing the data with a log structure. Further, we propose to use the AVL tree structure to store the sequence information of the data. We have implemented a prototype of SSDUP based on the OrangeFS and performed extensive experimental evaluation. The experimental results show that the proposed SSDUP scheme can improve the write performance by more than 50% on average.

[1]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Wei Shi,et al.  LiU: Hiding Disk Access Latency for HPC Applications with a New SSD-Enabled Data Layout , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[3]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[4]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[5]  Song Jiang,et al.  InterferenceRemoval: removing interference of disk access for MPI programs through data replication , 2010, ICS '10.

[6]  Marianne Winslett,et al.  A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[7]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[9]  Rajeev Thakur,et al.  Improving Parallel I/O Performance with Data Layout Awareness , 2010, 2010 IEEE International Conference on Cluster Computing.

[10]  Phillip M. Dickens,et al.  Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , 2009, HPDC '09.

[11]  Xian-He Sun,et al.  S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[12]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Song Jiang,et al.  iBridge: Improving Unaligned Parallel File Access with Solid-State Drives , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[15]  Wei-keng Liao,et al.  Evaluating I/O characteristics and methods for storing structured scientific data , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[16]  Kai Li,et al.  RIPQ: Advanced Photo Caching on Flash for Facebook , 2015, FAST.

[17]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[18]  Song Jiang,et al.  Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[19]  Rajeev Thakur,et al.  Pattern-Direct and Layout-Aware Replication Scheme for Parallel I/O Systems , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[20]  Mithuna Thottethodi,et al.  SieveStore: a highly-selective, ensemble-level disk cache for cost-performance , 2010, ISCA '10.

[21]  Song Jiang,et al.  iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[22]  Hai Jin,et al.  Iteration Based Collective I/O Strategy for Parallel I/O Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.