论文信息 - Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters

Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters

We have developed a scalable hierarchical parallelization scheme for molecular dynamics (MD) simulation on multicore clusters. The scheme explores multilevel parallelism combining: (1) Internode parallelism using spatial decomposition via message passing; (2) intercore parallelism using cellular decomposition via multithreading employing a master/worker model; (3) data-level optimization via single-instruction multiple-data (SIMD) parallelism with various code transformation techniques. By using a hierarchy of parallelisms, the scheme exposes very high concurrency and data locality, thereby achieving: (1) internode weak-scaling parallel efficiency 0.985 on 106,496 BlueGene/L nodes (0.975 on 32,768 BlueGene/P nodes), internode strong-scaling parallel efficiency 0.90 on 8,192 BlueGene/L nodes; (2) intercore multithread parallel efficiency 0.65 for eight threads on a dual quadcore Xeon platform; and (3) SIMD speedup around 2 for problem sizes ranging from 3,072 to 98,304 atoms. Furthermore, the effect of memory-access penalty on SIMD performance is analyzed, and an application-based SIMD analysis scheme is proposed to help programmers determine whether their applications are amenable to SIMDization.

[1] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.

[2] Erik R. Altman. Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems , 2008 .

[3] Subhash Saini,et al. Scalable atomistic simulation algorithms for materials research , 2001, SC.

[4] Liu Peng,et al. High-order stencil computations on multicore clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5] William J. Dally,et al. Analysis and Performance Results of a Molecular Modeling Application on Merrimac , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[6] William J. Dally,et al. Executing irregular scientific applications on stream architectures , 2007, ICS '07.

[7] Laxmikant V. Kalé,et al. NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[9] Yves Robert,et al. On the Alignment Problem , 1994, Parallel Process. Lett..

[10] V.K. Prasanna,et al. Preliminary Investigation of Advanced Electrostatics in Molecular Dynamics on Reconfigurable Computers , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[11] J. Dongarra,et al. The Impact of Multicore on Computational Science Software , 2007 .

[12] Toshikazu Ebisuzaki,et al. A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3 , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[13] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14] Rajiv K. Kalia,et al. Multimillion atom simulation of materials on parallel computers - Nanopixel, interfacial fracture, nanoindentation, and oxidation , 2001 .

[15] José E. Moreira,et al. Demonstrating the scalability of a molecular dynamics application on a Petaflop computer , 2001, ICS '01.

[16] Wonyong Sung,et al. Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware , 2008, CASES '08.

[17] Weiqiang Wang,et al. A metascalable computing framework for large spatiotemporal-scale atomistic simulations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.