Early Performance Evaluation of the Hybrid Cluster with Torus Interconnect Aimed at Molecular-Dynamics Simulations

In this paper, we describe the Desmos cluster that consists of 32 hybrid nodes connected by a low-latency high-bandwidth torus interconnect. This cluster is aimed at cost-effective classical molecular dynamics calculations. We present strong scaling benchmarks for GROMACS, LAMMPS and VASP and compare the results with other HPC systems. This cluster serves as a test bed for the Angara interconnect that supports 3D and 4D torus network topologies, and verifies its ability to unite MPP systems speeding-up effectively MPI-based applications. We describe the interconnect presenting typical MPI benchmarks.

[1]  Stefano Piana,et al.  Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. , 2014, Current opinion in structural biology.

[2]  Paolo Bientinesi,et al.  The Vectorization of the Tersoff Multi-body Potential: An Exercise in Performance Portability , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Godehard Sutmann,et al.  Adaptive dynamic load-balancing with irregular domain decomposition for particle simulations , 2015, Comput. Phys. Commun..

[4]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[5]  José Duato,et al.  Adaptive bubble router: a design to improve performance in torus networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[6]  Hans-Joachim Bungartz,et al.  591 TFLOPS Multi-trillion Particles Simulation on SuperMUC , 2013, ISC.

[7]  Roman Wyrzykowski,et al.  Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures , 2017, Concurr. Comput. Pract. Exp..

[8]  Paul S. Crozier,et al.  General-purpose molecular dynamics simulations on GPU-based clusters , 2011 .

[9]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[10]  Martin Fechner,et al.  Best bang for your buck: GPU nodes for GROMACS biomolecular simulations , 2015, J. Comput. Chem..

[11]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[12]  Fabiano Corsetti,et al.  Performance Analysis of Electronic Structure Codes on HPC Systems: A Case Study of SIESTA , 2014, PloS one.

[13]  Steven J. Plimpton,et al.  Implementing molecular dynamics on hybrid high performance computers - Particle-particle particle-mesh , 2012, Comput. Phys. Commun..

[14]  Vijay S. Pande,et al.  Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU , 2009, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[15]  Vladimir V. Stegailov,et al.  Efficiency of classical molecular dynamics algorithms on supercomputers , 2016 .

[16]  Vladimir V. Stegailov,et al.  HPC Hardware Efficiency for Quantum and Classical Molecular Dynamics , 2015, PaCT.

[17]  Makoto Taiji,et al.  MDGRAPE-4: a special-purpose computer system for molecular dynamics simulations , 2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18]  Hans-Joachim Bungartz,et al.  Supercomputing for Molecular Dynamics Simulations , 2015, SpringerBriefs in Computer Science.

[19]  Torsten Hoefler,et al.  Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[20]  P. Jänne,et al.  Increased SOX2 Gene Copy Number Is Associated with FGFR1 and PIK3CA Gene Gain in Non-Small Cell Lung Cancer and Predicts Improved Survival in Early Stage Disease , 2014, PloS one.

[21]  Ulrich Brüning,et al.  Scalable communication architecture for network-attached accelerators , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[22]  D. van der Spoel,et al.  GROMACS: A message-passing parallel molecular dynamics implementation , 1995 .

[23]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[24]  Tomohiro Inoue,et al.  The Tofu Interconnect , 2012, IEEE Micro.

[25]  Philip Heidelberger,et al.  Blue Gene/L torus interconnection network , 2005, IBM J. Res. Dev..

[26]  Peng Wang,et al.  Implementing molecular dynamics on hybrid high performance computers - short range forces , 2011, Comput. Phys. Commun..