Using Simulation to Validate Performance of MPI(-IO) Implementations

Parallel file systems and MPI implementations aim to exploit available hardware resources in order to achieve optimal performance. Since performance is influenced by many hardware and software factors, achieving optimal performance is a daunting task. For these reasons, optimized communication and I/O algorithms are still subject to research. While complexity of collective MPI operations is discussed in literature sometimes, theoretic assessment of the measurements is de facto non-existent. Instead, conducted analysis is typically limited to performance comparisons to previous algorithms.

[1]  Rajeev Thakur,et al.  Optimizing noncontiguous accesses in MPI-IO , 2002, Parallel Comput..

[2]  Keith D. Underwood,et al.  The structural simulation toolkit: exploring novel architectures , 2006, SC.

[3]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[4]  A. Lumsdaine,et al.  LogGOPSim: simulating large-scale applications in the LogGOPS model , 2010, HPDC '10.

[5]  Marc-André Hermanns,et al.  Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[6]  Edgar Gabriel,et al.  Runtime Optimization of Application Level Communication Patterns , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[7]  Xin Yuan,et al.  STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.

[8]  Julian M. Kunkel Simulating parallel programs on application and system level , 2012, Computer Science - Research and Development.

[9]  Guillaume Mercier,et al.  Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[10]  Robert B. Ross,et al.  CODES: Enabling Co-Design of Multi-Layer Exascale Storage Architectures , 2011 .

[11]  Jesús Carretero,et al.  A collective I/O implementation based on inspector–executor paradigm , 2008, The Journal of Supercomputing.

[12]  Thomas Ludwig,et al.  Simulation-Aided Performance Evaluation of Server-Side Input/Output Optimizations , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[13]  Jesús Labarta,et al.  Validation of Dimemas Communication Model for MPI Collective Operations , 2000, PVM/MPI.

[14]  Thomas Ludwig,et al.  I/O performance evaluation with Parabench - programmable I/O benchmark , 2010, ICCS.

[15]  Xiaofang Zhao,et al.  Accurate Analytical Models for Message Passing on Multi-core Clusters , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[16]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[17]  Joachim Worringen Self-adaptive Hints for Collective I/O , 2006, PVM/MPI.

[18]  Barbara Horner-Miller,et al.  Proceedings of the 2006 ACM/IEEE conference on Supercomputing , 2006 .

[19]  George Bosilca,et al.  Open MPI: A High-Performance, Heterogeneous MPI , 2006, 2006 IEEE International Conference on Cluster Computing.

[20]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[21]  Thomas Ludwig,et al.  Performance Evaluation of the PVFS2 Architecture , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).