HPCC RandomAccess benchmark for next generation supercomputers

In this paper we examine the key elements determining the performance of the HPC Challenge RandomAccess benchmark on next generation supercomputers. We find that the performance of this benchmark is closely related to the bisection bandwidth of the underlying communication network, performance of integer divide operation and details of benchmark specifications such as error tolerance and permissible multi-core mapping strategies. We demonstrate that seemingly small and innocuous changes in the benchmark can lead to significantly different system performance. We also present an algorithm to optimize RandomAccess benchmark for multi-core systems. Our algorithm uses aggregation and software routing and balances the load on the cores by specializing each of the cores for one specific routing or update function. This algorithm gives approximately a factor of 3 speedup on the Blue Gene/P system which is based on quad-core nodes.

[1]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[2]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..

[3]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[4]  Philip Heidelberger,et al.  Optimization of All-to-All Communication on the Blue Gene/L Supercomputer , 2008, 2008 37th International Conference on Parallel Processing.

[5]  Kenneth A. Ross,et al.  Adaptive Aggregation on Chip Multiprocessors , 2007, VLDB.

[6]  John A. Gunnels,et al.  Optimization of fast Fourier transforms on the Blue Gene/L supercomputer , 2008, HiPC'08.

[7]  Yogish Sabharwal,et al.  Software Routing and Aggregation of Messages to Optimize the Performance of HPCC Randomaccess Benchmark , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[8]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[9]  Christian Jacobi,et al.  Highly Concurrent Locking in Shared Memory Database Systems , 1999, Euro-Par.

[10]  Jean-Francois Collard,et al.  The architecture of the HP Superdome shared-memory multiprocessor , 2005, ICS '05.

[11]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[12]  Eric M. Schwarz,et al.  IBM POWER6 microarchitecture , 2007, IBM J. Res. Dev..

[13]  David R. Karger,et al.  Koorde: A Simple Degree-Optimal Distributed Hash Table , 2003, IPTPS.

[14]  Philip Heidelberger,et al.  The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.

[15]  Rupak Biswas,et al.  Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers , 2008, HiPC 2008.

[16]  Subhash Saini,et al.  Application-based early performance evaluation of SGI altix 4700 systems for SGI systems , 2008, CF '08.

[17]  David Abramson,et al.  The Virtual Laboratory: a toolset to enable distributed molecular modelling for drug design on the World‐Wide Grid , 2003, Concurr. Comput. Pract. Exp..