Optimizing the SSD Burst Buffer by Traffic Detection

Currently, HPC storage systems still use hard disk drive (HDD) as their dominant storage device. Solid state drive (SSD) is widely deployed as the buffer to HDDs. Burst buffer has also been proposed to manage the SSD buffering of bursty write requests. Although burst buffer can improve I/O performance in many cases, we find that it has some limitations such as requiring large SSD capacity and harmonious overlapping between computation phase and data flushing phase. In this article, we propose a scheme, called SSDUP+.1 SSDUP+ aims to improve the burst buffer by addressing the above limitations. First, to reduce the demand for the SSD capacity, we develop a novel method to detect and quantify the data randomness in the write traffic. Further, an adaptive algorithm is proposed to classify the random writes dynamically. By doing so, much less SSD capacity is required to achieve the similar performance as other burst buffer schemes. Next, to overcome the difficulty of perfectly overlapping the computation phase and the flushing phase, we propose a pipeline mechanism for the SSD buffer, in which data buffering and flushing are performed in pipeline. In addition, to improve the I/O throughput, we adopt a traffic-aware flushing strategy to reduce the I/O interference in HDD. Finally, to further improve the performance of buffering random writes in SSD, SSDUP+ transforms the random writes to sequential writes in SSD by storing the data with a log structure. Further, SSDUP+ uses the AVL tree structure to store the sequence information of the data. We have implemented a prototype of SSDUP+ based on OrangeFS and conducted extensive experiments. The experimental results show that our proposed SSDUP+ can save an average of 50% SSD space while delivering almost the same performance as other common burst buffer schemes. In addition, SSDUP+ can save about 20% SSD space compared with the previous version of this work, SSDUP, while achieving 20–30% higher I/O throughput than SSDUP.

[1]  Ke Wang,et al.  ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[2]  Song Jiang,et al.  Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[3]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[4]  Rajeev Thakur,et al.  Pattern-Direct and Layout-Aware Replication Scheme for Parallel I/O Systems , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[5]  Song Jiang,et al.  iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[6]  Teng Wang,et al.  BurstFS: A Distributed Burst Buffer File System for Scientific Applications , 2016 .

[7]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Nong Xiao,et al.  P3Stor: A parallel, durable flash-based SSD for enterprise-scale storage systems , 2011, Science China Information Sciences.

[9]  Xian-He Sun,et al.  S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[10]  Teng Wang,et al.  MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  Da-Wei Chang,et al.  A Load-Balancing Data Caching Scheme in Multi-tiered Storage Systems , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[13]  Wei Shi,et al.  LiU: Hiding Disk Access Latency for HPC Applications with a New SSD-Enabled Data Layout , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[14]  Tao Lu,et al.  Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior , 2017, 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[15]  Dan Feng,et al.  Improving flash-based disk cache with Lazy Adaptive Replacement , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[16]  Hai Jin,et al.  Iteration Based Collective I/O Strategy for Parallel I/O Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[17]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[18]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Hai Jin,et al.  SSDUP: a traffic-aware ssd burst buffer for HPC systems , 2017, ICS '17.

[20]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[21]  Wolfram Schenck,et al.  Evaluation and Performance Modeling of a Burst Buffer Solution , 2017, ACM SIGOPS Oper. Syst. Rev..

[22]  Soonwook Hwang,et al.  Accelerating a Burst Buffer Via User-Level I/O Isolation , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[23]  Marianne Winslett,et al.  A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[24]  Teng Wang,et al.  BurstMem: A high-performance burst buffer system for scientific applications , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[25]  Dan Feng,et al.  A Regional Popularity-Aware Cache replacement algorithm to improve the performance and lifetime of SSD-based disk cache , 2015, 2015 IEEE International Conference on Networking, Architecture and Storage (NAS).

[26]  Michael E. Papka,et al.  Mira: Argonne's 10-petaflops supercomputer , 2013 .

[27]  Canqun Yang,et al.  MilkyWay-2 supercomputer: system and application , 2014, Frontiers of Computer Science.

[28]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Dhabaleswar K. Panda,et al.  A 1 PB/s file system to checkpoint three million MPI tasks , 2013, HPDC.

[30]  Song Jiang,et al.  InterferenceRemoval: removing interference of disk access for MPI programs through data replication , 2010, ICS '10.

[31]  Feng Chen,et al.  Hystor: making the best use of solid state drives in high performance storage systems , 2011, ICS '11.

[32]  Song Jiang,et al.  iBridge: Improving Unaligned Parallel File Access with Solid-State Drives , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[33]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[34]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[35]  Wei-keng Liao,et al.  Evaluating I/O characteristics and methods for storing structured scientific data , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[36]  Kai Li,et al.  RIPQ: Advanced Photo Caching on Flash for Facebook , 2015, FAST.

[37]  Robert B. Ross,et al.  FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[38]  Reza Salkhordeh,et al.  ReCA: An Efficient Reconfigurable Cache Architecture for Storage Systems with Online Workload Characterization , 2018, IEEE Transactions on Parallel and Distributed Systems.

[39]  Rajeev Thakur,et al.  Improving Parallel I/O Performance with Data Layout Awareness , 2010, 2010 IEEE International Conference on Cluster Computing.

[40]  Phillip M. Dickens,et al.  Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , 2009, HPDC '09.