BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution

In today's "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these techniques focus on optimizing the access pattern on a single file or file extent, few of these techniques consider cross-file I/O optimizations. This paper aims to explore the potential benefit from cross-file I/O aggregation. We propose a Bundle-based PARallel Aggregation framework (BPAR) and design three partitioning schemes under such framework that targets at improving the I/O performance of a mission-critical application GEOS-5, as well as a broad range of other scientific applications. The results of our experiments reveal that BPAR can achieve on average 2.1× performance improvement over the baseline GEOS-5.

[1]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2]  John Shalf,et al.  Tuning HDF5 for Lustre File Systems , 2010 .

[3]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[4]  Eric Barton,et al.  A Novel network request scheduler for a large scale storage system , 2009, Computer Science - Research and Development.

[5]  Cong Xu,et al.  Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application , 2013, 2013 22nd International Conference on Computer Communication and Networks (ICCCN).

[6]  Karsten Schwan,et al.  Adaptable, metadata rich IO methods for portable high performance IO , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7]  Robert B. Ross,et al.  Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[9]  Feiyi Wang,et al.  OLCF ’ s 1 TB / s , Next-Generation Lustre File System , 2013 .

[10]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[11]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Jeffrey S. Vetter,et al.  Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[13]  Robert Latham,et al.  Combining I/O operations for multiple array variables in parallel netCDF , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[14]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[15]  Scott Klasky,et al.  A lightweight I/O scheme to facilitate spatial and temporal queries of scientific data analytics , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[16]  Saurabh Gupta,et al.  Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Robert B. Ross,et al.  Accelerating I/O Forwarding in IBM Blue Gene/P Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Galen M. Shipman,et al.  A Next-Generation Parallel File System Environment for the OLCF , 2012 .

[19]  Teng Wang,et al.  BurstMem: A high-performance burst buffer system for scientific applications , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[20]  Iain Bethune,et al.  Adding Parallel I/O to PARA-BMU , 2012 .

[21]  Wei-keng Liao,et al.  Scaling parallel I/O performance through I/O delegate and caching system , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[23]  Teng Wang,et al.  A case of system-wide power management for scientific applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[24]  Jeffrey S. Vetter,et al.  ParColl: Partitioned Collective I/O on the Cray XT , 2008, 2008 37th International Conference on Parallel Processing.

[25]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[26]  Cong Xu,et al.  SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[27]  Robert Latham,et al.  Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[28]  Wei-keng Liao,et al.  I/O analysis and optimization for an AMR cosmology application , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[29]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[30]  Yong Chen,et al.  Locality-driven high-level I/O aggregation for processing scientific datasets , 2013, 2013 IEEE International Conference on Big Data.