Accurately measuring collective operations at massive scale

Accurate, reproducible and comparable measurement of collective operations is a complicated task. Although different measurement schemes are implemented in well- known benchmarks, many of these schemes introduce different systematic errors in their measurements. We characterize these errors and select a window-based approach as the most accurate method. However, this approach complicates measurements significantly and introduces a clock synchronization as a new source of systematic errors. We analyze approaches to avoid or correct those errors and develop a scalable synchronization scheme to conduct benchmarks on massively parallel systems. Our results are compared to the window-based scheme implemented in the SKaMPI benchmarks and show a reduction of the synchronization overhead by a factor of 16 on 128 processes.

[1]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[2]  Jack J. Dongarra,et al.  Performance Analysis of MPI Collective Operations , 2005, IPDPS.

[3]  T. Kohno,et al.  Remote physical device fingerprinting , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[4]  Torsten Hoefler,et al.  Fast barrier synchronization for InfiniBand/spl trade/ , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[5]  Subhash Saini,et al.  Performance evaluation of supercomputers using HPCC and IMB Benchmarks , 2008, J. Comput. Syst. Sci..

[6]  Torsten Hoefler,et al.  Fast barrier synchronization for InfiniBand , 2006 .

[7]  Dhabaleswar K. Panda,et al.  Efficient and scalable barrier over Quadrics and Myrinet with a new NIC-based collective message passing protocol , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Werner Augustin,et al.  On Benchmarking Collective MPI Operations , 2002, PVM/MPI.

[9]  Torsten Hoefler,et al.  Parallel scaling of Teter’s minimization for Ab Initio calculations , 2006 .

[10]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[11]  Steven J. Murdoch,et al.  Hot or not: revealing hidden services by their clock skew , 2006, CCS '06.

[12]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[13]  Sadaf R. Alam,et al.  An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices , 2007, HPCC.

[14]  Torsten Hoefler,et al.  Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[15]  Robert A. van de Geijn,et al.  CollMark: MPI Collective Communication Benchmark , 2000 .

[16]  Michael M. Resch,et al.  Benchmark Evaluation of the Message-Passing Overhead on Modern Parallel Architectures , 1997, PARCO.

[17]  Nisheeth K. Vishnoi,et al.  The Impact of Noise on the Scaling of Collectives: A Theoretical Approach , 2005, HiPC.

[18]  Torsten Hoefler,et al.  Netgauge: A Network Performance Measurement Framework , 2007, HPCC.

[19]  Susan Coghlan,et al.  The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.

[20]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[21]  Torsten Hoefler,et al.  Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[22]  Richard P. Martin,et al.  LogP Performance Assessment of Fast Network Interfaces , 1995 .

[23]  Torsten Hoefler,et al.  LogfP - a model for small messages in InfiniBand , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[24]  I. Coorporation,et al.  Using the rdtsc instruction for performance monitoring , 1997 .

[25]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.