论文信息 - Enhance parallel input/output with cross-bundle aggregation

Enhance parallel input/output with cross-bundle aggregation

The exponential growth of computing power on leadership scale computing platforms imposes grand challenge to scientific applications’ input/output (I/O) performance. To bridge the performance gap between computation and I/O, various parallel I/O libraries have been developed and adopted by computer scientists. These libraries enhance the I/O parallelism by allowing multiple processes to concurrently access the shared data set. Meanwhile, they are integrated with a set of I/O optimization strategies such as data sieving and two-phase I/O to better exploit the supplied bandwidth of the underlying parallel file system. Most of these techniques are optimized for the access on a single bundle of variables generated by the scientific applications during the I/O phase, which is stored in the form of file. Few of these techniques focus on cross-bundle I/O optimizations. In this article, we investigate the potential benefit from cross-bundle I/O aggregation. Based on the analysis of the I/O patterns of a mission-critical scientific application named the Goddard Earth Observing System, version 5 (GEOS-5), we propose a Bundle-based PARallel Aggregation (BPAR) framework with three partitioning schemes to improve its I/O performance as well as the I/O performance of a broad range of other scientific applications. Our experiment result reveals that BPAR can deliver 2.1× I/O performance improvement over the baseline GEOS-5, and it is very promising in accelerating scientific applications’ I/O performance on various computing platforms.

[1] Jianwei Li,et al. Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2] Cong Xu,et al. Assessing the Performance Impact of High-Speed Interconnects on MapReduce , 2012, WBDB.

[3] Wei-keng Liao,et al. I/O analysis and optimization for an AMR cosmology application , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[4] Teng Wang,et al. BurstMem: A high-performance burst buffer system for scientific applications , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[5] Iain Bethune,et al. Adding Parallel I/O to PARA-BMU , 2012 .

[6] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[7] Karsten Schwan,et al. DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[8] Wei-keng Liao,et al. Scaling parallel I/O performance through I/O delegate and caching system , 2008, HiPC 2008.

[9] Scott Klasky,et al. Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[10] Jeffrey S. Vetter,et al. ParColl: Partitioned Collective I/O on the Cray XT , 2008, 2008 37th International Conference on Parallel Processing.

[11] Surendra Byna,et al. Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13] Yong Chen,et al. Locality-driven high-level I/O aggregation for processing scientific datasets , 2013, 2013 IEEE International Conference on Big Data.

[14] Zhiwei Xu,et al. DataMPI: Extending MPI to Hadoop-Like Big Data Computing , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[15] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[16] Jeffrey S. Vetter,et al. Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[17] Robert Latham,et al. Combining I/O operations for multiple array variables in parallel netCDF , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[18] Robert B. Ross,et al. On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[19] Karsten Schwan,et al. Adaptable, metadata rich IO methods for portable high performance IO , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[21] Robert B. Ross,et al. Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[22] Michael Stonebraker,et al. Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[23] Robert B. Ross,et al. PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[24] Rob VanderWijngaart,et al. NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[25] Teng Wang,et al. A case of system-wide power management for scientific applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[26] Galen M. Shipman,et al. A Next-Generation Parallel File System Environment for the OLCF , 2012 .

[27] John Bent,et al. PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28] Hui Chen,et al. BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution , 2014, 2014 International Workshop on Data Intensive Scalable Computing Systems.

[29] Scott Klasky,et al. A lightweight I/O scheme to facilitate spatial and temporal queries of scientific data analytics , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[30] Saurabh Gupta,et al. Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31] John Shalf,et al. Tuning HDF5 for Lustre File Systems , 2010 .

[32] Eric Barton,et al. A Novel network request scheduler for a large scale storage system , 2009, Computer Science - Research and Development.

[33] Cong Xu,et al. Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application , 2013, 2013 22nd International Conference on Computer Communication and Networks (ICCCN).

[34] Robert Latham,et al. Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[35] Teng Wang,et al. Characterization and Optimization of Memory-Resident MapReduce on HPC Systems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[36] Robert B. Ross,et al. Accelerating I/O Forwarding in IBM Blue Gene/P Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[37] Feiyi Wang,et al. OLCF ’ s 1 TB / s , Next-Generation Lustre File System , 2013 .

[38] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[39] Teng Wang,et al. TRIO: Burst Buffer Based I/O Orchestration , 2015, 2015 IEEE International Conference on Cluster Computing.

[40] Surendra Byna,et al. Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[41] Cong Xu,et al. SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[42] Dror G. Feitelson,et al. Overview of the MPI-IO Parallel I/O Interface , 1996, Input/Output in Parallel and Distributed Computer Systems.