Automatically Selecting the Number of Aggregators for Collective I/O Operations

Optimizing collective I/O operations is of paramount importance for many data intensive high performance computing applications. Despite the large number of algorithms published in the field, most current approaches focus on tuning every single application scenario separately and do not offer a consistent and automatic method of identifying internal parameters for collective I/O algorithms. Most notably, published work exists to optimize the number of processes actually touching a file, the so-called aggregators. This paper introduces a novel runtime approach to determine the number of aggregator processes to be used in a collective I/O operation depending on the file view, process topology, the per-process write saturation point, and the actual amount of data written in a collective write operation. The algorithm is evaluated on two different file systems with multiple benchmarks. In more than 80\% of the test cases, our algorithm delivered a performance close to the best performance obtained by hand-tuning the number of aggregators for each scenario.

[1]  Alok N. Choudhary,et al.  Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.

[2]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[3]  Joachim Worringen Self-adaptive Hints for Collective I/O , 2006, PVM/MPI.

[4]  Song Jiang,et al.  Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5]  Jeffrey S. Vetter,et al.  Exploiting Lustre File Joining for Effective Collective IO , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[6]  Rajeev Thakur,et al.  A Case for Using MPI's Derived Datatypes to Improve I/O Performance , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[7]  Wei-keng Liao,et al.  Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols , 2008, HiPC 2008.

[8]  Edgar Gabriel,et al.  Performance Evaluation of Collective Write Algorithms in MPI I/O , 2009, ICCS.

[9]  Jack J. Dongarra,et al.  Evaluating Dynamic Communicators and One-Sided Operations for Current MPI Libraries , 2005, Int. J. High Perform. Comput. Appl..

[10]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[11]  George Bosilca,et al.  OMPIO: A Modular Software Architecture for MPI I/O , 2011, EuroMPI.

[12]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[13]  Edgar Gabriel,et al.  Towards high performance cell segmentation in multispectral fine needle aspiration cytology of thyroid lesions , 2010, Comput. Methods Programs Biomed..

[14]  Phillip M. Dickens,et al.  Y-lib: a user level library to increase the performance of MPI-IO in a lustre file system environment , 2009, HPDC '09.

[15]  Jesús Carretero,et al.  View-Based Collective I/O for MPI-IO , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[16]  Wei-keng Liao,et al.  Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.