Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems

Classic vector systems have all but vanished from recent TOP500 lists. Looking at the recently introduced NEC SX-9 series, we benchmark its memory subsystem using the low level vector triad and employ the kernel of an advanced lattice Boltzmann flow solver to demonstrate that classic vectors still combine excellent performance with a well-established optimization approach. To investigate the multi-node performance, the flow field in a real porous medium is simulated using the hybrid MPI/OpenMP parallel ILBDC lattice Boltzmann application code. Results for a commodity Intel Nehalem-based cluster are provided for comparison. Clusters can keep up with the vector systems, however, require massive parallelism and thus much more effort to provide a good domain decomposition.

[1]  Volker Strumpen,et al.  The memory behavior of cache oblivious stencil computations , 2007, The Journal of Supercomputing.

[2]  Irina Ginzburg,et al.  Lattice Boltzmann approach to Richards' equation , 2004 .

[3]  J. Boon The Lattice Boltzmann Equation for Fluid Dynamics and Beyond , 2003 .

[4]  G. Wellein,et al.  Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method , 2008 .

[5]  Gerhard Wellein,et al.  The world's fastest CPU and SMP node: Some performance results from the NEC SX-9 , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Gerhard Wellein,et al.  On the single processor performance of simple lattice Boltzmann kernels , 2006 .

[7]  L. Luo,et al.  Theory of the lattice Boltzmann method: From the Boltzmann equation to the lattice Boltzmann equation , 1997 .

[8]  Gerhard Wellein,et al.  Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures , 2003, Int. J. High Perform. Comput. Appl..

[9]  Gerhard Wellein,et al.  Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver , 2008, High Performance Computing in Science and Engineering.

[10]  Samuel Williams,et al.  Lattice Boltzmann simulation optimization on leading multicore platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11]  Michael Lang,et al.  A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing , 2008, Parallel Process. Lett..

[12]  Hiroaki Kobayashi,et al.  First Experiences with NEC SX-9 , 2008, High Performance Computing on Vector Systems.

[13]  Samuel Williams,et al.  Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..

[14]  Ulrich Rüde,et al.  Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures , 2004, Proceedings of the ACM/IEEE SC2004 Conference.