论文信息 - Optimizing performance on modern HPC systems: learning from simple kernel benchmarks

Optimizing performance on modern HPC systems: learning from simple kernel benchmarks

We discuss basic optimization and parallelization strategies for current cache-based microprocessors (Intel Itanium2, Intel Netburst and AMD64 variants) in single-CPU and shared memory environments. Using selected kernel benchmarks representing data intensive applications we focus on the effective bandwidths attainable, which is still suboptimal using current compilers. We stress the need for a subtle OpenMP implementation even for simple benchmark programs, to exploit the high aggregate memory bandwidth available nowadays on ccNUMA systems. If the quality of main memory access is the measure, classical vector systems such as the NEC SX6+ are still a class of their own and are able to sustain the performance level of in-cache operations of modern microprocessors even with arbitrarily large data sets.

Gerhard Wellein | Georg Hager | Thomas Zeiser | Jan Treibig

[1] Gerhard Wellein,et al. Performance of Scientific Applications on Modern Supercomputers , 2005 .

[2] Arndt Bode,et al. High Performance Computing In Science And Engineering, Munich 2004: Transactions Of The Second Joint HLRB And KONWIHR Status And Result Workshop, Marc ... University Of Munich, And Leibniz-Rechenzent , 2005 .

[3] Michael M. Resch,et al. High performance computing in science and engineering , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[4] William Jalby,et al. Improving load/store queues usage in scientific computing , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[5] J. Shalf,et al. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[6] Ulrich Rüde,et al. Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[7] D. B. Davis,et al. Intel Corp. , 1993 .

[8] Leonid Oliker,et al. Scientific Computations on Modern Parallel Vector Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.